computationally efficient reduced: Topics by Science.gov

Sample records for computationally efficient reduced

Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

NASA Technical Reports Server (NTRS)

Pham, Long; Chen, Aijun; Kempler, Steven; Lynnes, Christopher; Theobald, Michael; Asghar, Esfandiari; Campino, Jane; Vollmer, Bruce

2011-01-01

Cloud Computing has been implemented in several commercial arenas. The NASA Nebula Cloud Computing platform is an Infrastructure as a Service (IaaS) built in 2008 at NASA Ames Research Center and 2010 at GSFC. Nebula is an open source Cloud platform intended to: a) Make NASA realize significant cost savings through efficient resource utilization, reduced energy consumption, and reduced labor costs. b) Provide an easier way for NASA scientists and researchers to efficiently explore and share large and complex data sets. c) Allow customers to provision, manage, and decommission computing capabilities on an as-needed bases
ADVANCED COMPUTATIONAL METHODS IN DOSE MODELING: APPLICATION OF COMPUTATIONAL BIOPHYSICAL TRANSPORT, COMPUTATIONAL CHEMISTRY, AND COMPUTATIONAL BIOLOGY

EPA Science Inventory

Computational toxicology (CompTox) leverages the significant gains in computing power and computational techniques (e.g., numerical approaches, structure-activity relationships, bioinformatics) realized over the last few years, thereby reducing costs and increasing efficiency i...
Reducing Vehicle Weight and Improving U.S. Energy Efficiency Using Integrated Computational Materials Engineering

NASA Astrophysics Data System (ADS)

Joost, William J.

2012-09-01

Transportation accounts for approximately 28% of U.S. energy consumption with the majority of transportation energy derived from petroleum sources. Many technologies such as vehicle electrification, advanced combustion, and advanced fuels can reduce transportation energy consumption by improving the efficiency of cars and trucks. Lightweight materials are another important technology that can improve passenger vehicle fuel efficiency by 6-8% for each 10% reduction in weight while also making electric and alternative vehicles more competitive. Despite the opportunities for improved efficiency, widespread deployment of lightweight materials for automotive structures is hampered by technology gaps most often associated with performance, manufacturability, and cost. In this report, the impact of reduced vehicle weight on energy efficiency is discussed with a particular emphasis on quantitative relationships determined by several researchers. The most promising lightweight materials systems are described along with a brief review of the most significant technical barriers to their implementation. For each material system, the development of accurate material models is critical to support simulation-intensive processing and structural design for vehicles; improved models also contribute to an integrated computational materials engineering (ICME) approach for addressing technical barriers and accelerating deployment. The value of computational techniques is described by considering recent ICME and computational materials science success stories with an emphasis on applying problem-specific methods.
Computational methods for aerodynamic design using numerical optimization

NASA Technical Reports Server (NTRS)

Peeters, M. F.

1983-01-01

Five methods to increase the computational efficiency of aerodynamic design using numerical optimization, by reducing the computer time required to perform gradient calculations, are examined. The most promising method consists of drastically reducing the size of the computational domain on which aerodynamic calculations are made during gradient calculations. Since a gradient calculation requires the solution of the flow about an airfoil whose geometry was slightly perturbed from a base airfoil, the flow about the base airfoil is used to determine boundary conditions on the reduced computational domain. This method worked well in subcritical flow.
Framework for computationally efficient optimal irrigation scheduling using ant colony optimization

USDA-ARS?s Scientific Manuscript database

A general optimization framework is introduced with the overall goal of reducing search space size and increasing the computational efficiency of evolutionary algorithm application for optimal irrigation scheduling. The framework achieves this goal by representing the problem in the form of a decisi...
Image restoration for three-dimensional fluorescence microscopy using an orthonormal basis for efficient representation of depth-variant point-spread functions

PubMed Central

Patwary, Nurmohammed; Preza, Chrysanthe

2015-01-01

A depth-variant (DV) image restoration algorithm for wide field fluorescence microscopy, using an orthonormal basis decomposition of DV point-spread functions (PSFs), is investigated in this study. The efficient PSF representation is based on a previously developed principal component analysis (PCA), which is computationally intensive. We present an approach developed to reduce the number of DV PSFs required for the PCA computation, thereby making the PCA-based approach computationally tractable for thick samples. Restoration results from both synthetic and experimental images show consistency and that the proposed algorithm addresses efficiently depth-induced aberration using a small number of principal components. Comparison of the PCA-based algorithm with a previously-developed strata-based DV restoration algorithm demonstrates that the proposed method improves performance by 50% in terms of accuracy and simultaneously reduces the processing time by 64% using comparable computational resources. PMID:26504634
On the study of control effectiveness and computational efficiency of reduced Saint-Venant model in model predictive control of open channel flow

NASA Astrophysics Data System (ADS)

Xu, M.; van Overloop, P. J.; van de Giesen, N. C.

2011-02-01

Model predictive control (MPC) of open channel flow is becoming an important tool in water management. The complexity of the prediction model has a large influence on the MPC application in terms of control effectiveness and computational efficiency. The Saint-Venant equations, called SV model in this paper, and the Integrator Delay (ID) model are either accurate but computationally costly, or simple but restricted to allowed flow changes. In this paper, a reduced Saint-Venant (RSV) model is developed through a model reduction technique, Proper Orthogonal Decomposition (POD), on the SV equations. The RSV model keeps the main flow dynamics and functions over a large flow range but is easier to implement in MPC. In the test case of a modeled canal reach, the number of states and disturbances in the RSV model is about 45 and 16 times less than the SV model, respectively. The computational time of MPC with the RSV model is significantly reduced, while the controller remains effective. Thus, the RSV model is a promising means to balance the control effectiveness and computational efficiency.
Scientific Discovery through Advanced Computing (SciDAC-3) Partnership Project Annual Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoffman, Forest M.; Bochev, Pavel B.; Cameron-Smith, Philip J..

The Applying Computationally Efficient Schemes for BioGeochemical Cycles ACES4BGC Project is advancing the predictive capabilities of Earth System Models (ESMs) by reducing two of the largest sources of uncertainty, aerosols and biospheric feedbacks, with a highly efficient computational approach. In particular, this project is implementing and optimizing new computationally efficient tracer advection algorithms for large numbers of tracer species; adding important biogeochemical interactions between the atmosphere, land, and ocean models; and applying uncertainty quanti cation (UQ) techniques to constrain process parameters and evaluate uncertainties in feedbacks between biogeochemical cycles and the climate system.
Reduced-Order Models for the Aeroelastic Analysis of Ares Launch Vehicles

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Vatsa, Veer N.; Biedron, Robert T.

2010-01-01

This document presents the development and application of unsteady aerodynamic, structural dynamic, and aeroelastic reduced-order models (ROMs) for the ascent aeroelastic analysis of the Ares I-X flight test and Ares I crew launch vehicles using the unstructured-grid, aeroelastic FUN3D computational fluid dynamics (CFD) code. The purpose of this work is to perform computationally-efficient aeroelastic response calculations that would be prohibitively expensive via computation of multiple full-order aeroelastic FUN3D solutions. These efficient aeroelastic ROM solutions provide valuable insight regarding the aeroelastic sensitivity of the vehicles to various parameters over a range of dynamic pressures.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage

PubMed Central

Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

2018-01-01

Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.

PubMed

Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

2018-04-13

Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
Reduced-Order Modeling: New Approaches for Computational Physics

NASA Technical Reports Server (NTRS)

Beran, Philip S.; Silva, Walter A.

2001-01-01

In this paper, we review the development of new reduced-order modeling techniques and discuss their applicability to various problems in computational physics. Emphasis is given to methods ba'sed on Volterra series representations and the proper orthogonal decomposition. Results are reported for different nonlinear systems to provide clear examples of the construction and use of reduced-order models, particularly in the multi-disciplinary field of computational aeroelasticity. Unsteady aerodynamic and aeroelastic behaviors of two- dimensional and three-dimensional geometries are described. Large increases in computational efficiency are obtained through the use of reduced-order models, thereby justifying the initial computational expense of constructing these models and inotivatim,- their use for multi-disciplinary design analysis.
Equivalent model construction for a non-linear dynamic system based on an element-wise stiffness evaluation procedure and reduced analysis of the equivalent system

NASA Astrophysics Data System (ADS)

Kim, Euiyoung; Cho, Maenghyo

2017-11-01

In most non-linear analyses, the construction of a system matrix uses a large amount of computation time, comparable to the computation time required by the solving process. If the process for computing non-linear internal force matrices is substituted with an effective equivalent model that enables the bypass of numerical integrations and assembly processes used in matrix construction, efficiency can be greatly enhanced. A stiffness evaluation procedure (STEP) establishes non-linear internal force models using polynomial formulations of displacements. To efficiently identify an equivalent model, the method has evolved such that it is based on a reduced-order system. The reduction process, however, makes the equivalent model difficult to parameterize, which significantly affects the efficiency of the optimization process. In this paper, therefore, a new STEP, E-STEP, is proposed. Based on the element-wise nature of the finite element model, the stiffness evaluation is carried out element-by-element in the full domain. Since the unit of computation for the stiffness evaluation is restricted by element size, and since the computation is independent, the equivalent model can be constructed efficiently in parallel, even in the full domain. Due to the element-wise nature of the construction procedure, the equivalent E-STEP model is easily characterized by design parameters. Various reduced-order modeling techniques can be applied to the equivalent system in a manner similar to how they are applied in the original system. The reduced-order model based on E-STEP is successfully demonstrated for the dynamic analyses of non-linear structural finite element systems under varying design parameters.
Remote sensing image ship target detection method based on visual attention model

NASA Astrophysics Data System (ADS)

Sun, Yuejiao; Lei, Wuhu; Ren, Xiaodong

2017-11-01

The traditional methods of detecting ship targets in remote sensing images mostly use sliding window to search the whole image comprehensively. However, the target usually occupies only a small fraction of the image. This method has high computational complexity for large format visible image data. The bottom-up selective attention mechanism can selectively allocate computing resources according to visual stimuli, thus improving the computational efficiency and reducing the difficulty of analysis. Considering of that, a method of ship target detection in remote sensing images based on visual attention model was proposed in this paper. The experimental results show that the proposed method can reduce the computational complexity while improving the detection accuracy, and improve the detection efficiency of ship targets in remote sensing images.
Two reduced form air quality modeling techniques for rapidly calculating pollutant mitigation potential across many sources, locations and precursor emission types

EPA Science Inventory

Due to the computational cost of running regional-scale numerical air quality models, reduced form models (RFM) have been proposed as computationally efficient simulation tools for characterizing the pollutant response to many different types of emission reductions. The U.S. Envi...
Distributing and storing data efficiently by means of special datasets in the ATLAS collaboration

NASA Astrophysics Data System (ADS)

Köneke, Karsten; ATLAS Collaboration

2011-12-01

With the start of the LHC physics program, the ATLAS experiment started to record vast amounts of data. This data has to be distributed and stored on the world-wide computing grid in a smart way in order to enable an effective and efficient analysis by physicists. This article describes how the ATLAS collaboration chose to create specialized reduced datasets in order to efficiently use computing resources and facilitate physics analyses.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

PubMed

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

PubMed Central

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-01-01

Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Efficient Strategies for Estimating the Spatial Coherence of Backscatter

PubMed Central

Hyun, Dongwoon; Crowley, Anna Lisa C.; Dahl, Jeremy J.

2017-01-01

The spatial coherence of ultrasound backscatter has been proposed to reduce clutter in medical imaging, to measure the anisotropy of the scattering source, and to improve the detection of blood flow. These techniques rely on correlation estimates that are obtained using computationally expensive strategies. In this study, we assess existing spatial coherence estimation methods and propose three computationally efficient modifications: a reduced kernel, a downsampled receive aperture, and the use of an ensemble correlation coefficient. The proposed methods are implemented in simulation and in vivo studies. Reducing the kernel to a single sample improved computational throughput and improved axial resolution. Downsampling the receive aperture was found to have negligible effect on estimator variance, and improved computational throughput by an order of magnitude for a downsample factor of 4. The ensemble correlation estimator demonstrated lower variance than the currently used average correlation. Combining the three methods, the throughput was improved 105-fold in simulation with a downsample factor of 4 and 20-fold in vivo with a downsample factor of 2. PMID:27913342
REVEAL: An Extensible Reduced Order Model Builder for Simulation and Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, Khushbu; Sharma, Poorva; Ma, Jinliang

2013-04-30

Many science domains need to build computationally efficient and accurate representations of high fidelity, computationally expensive simulations. These computationally efficient versions are known as reduced-order models. This paper presents the design and implementation of a novel reduced-order model (ROM) builder, the REVEAL toolset. This toolset generates ROMs based on science- and engineering-domain specific simulations executed on high performance computing (HPC) platforms. The toolset encompasses a range of sampling and regression methods that can be used to generate a ROM, automatically quantifies the ROM accuracy, and provides support for an iterative approach to improve ROM accuracy. REVEAL is designed to bemore » extensible in order to utilize the core functionality with any simulator that has published input and output formats. It also defines programmatic interfaces to include new sampling and regression techniques so that users can ‘mix and match’ mathematical techniques to best suit the characteristics of their model. In this paper, we describe the architecture of REVEAL and demonstrate its usage with a computational fluid dynamics model used in carbon capture.« less

The Application of Multiobjective Evolutionary Algorithms to an Educational Computational Model of Science Information Processing: A Computational Experiment in Science Education

ERIC Educational Resources Information Center

Lamb, Richard L.; Firestone, Jonah B.

2017-01-01

Conflicting explanations and unrelated information in science classrooms increase cognitive load and decrease efficiency in learning. This reduced efficiency ultimately limits one's ability to solve reasoning problems in the science. In reasoning, it is the ability of students to sift through and identify critical pieces of information that is of…
An 81.6 μW FastICA processor for epileptic seizure detection.

PubMed

Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming

2015-02-01

To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Energy Efficiency Challenges of 5G Small Cell Networks.

PubMed

Ge, Xiaohu; Yang, Jing; Gharavi, Hamid; Sun, Yang

2017-05-01

The deployment of a large number of small cells poses new challenges to energy efficiency, which has often been ignored in fifth generation (5G) cellular networks. While massive multiple-input multiple outputs (MIMO) will reduce the transmission power at the expense of higher computational cost, the question remains as to which computation or transmission power is more important in the energy efficiency of 5G small cell networks. Thus, the main objective in this paper is to investigate the computation power based on the Landauer principle. Simulation results reveal that more than 50% of the energy is consumed by the computation power at 5G small cell base stations (BSs). Moreover, the computation power of 5G small cell BS can approach 800 watt when the massive MIMO (e.g., 128 antennas) is deployed to transmit high volume traffic. This clearly indicates that computation power optimization can play a major role in the energy efficiency of small cell networks.
Energy Efficiency Challenges of 5G Small Cell Networks

PubMed Central

Ge, Xiaohu; Yang, Jing; Gharavi, Hamid; Sun, Yang

2017-01-01

The deployment of a large number of small cells poses new challenges to energy efficiency, which has often been ignored in fifth generation (5G) cellular networks. While massive multiple-input multiple outputs (MIMO) will reduce the transmission power at the expense of higher computational cost, the question remains as to which computation or transmission power is more important in the energy efficiency of 5G small cell networks. Thus, the main objective in this paper is to investigate the computation power based on the Landauer principle. Simulation results reveal that more than 50% of the energy is consumed by the computation power at 5G small cell base stations (BSs). Moreover, the computation power of 5G small cell BS can approach 800 watt when the massive MIMO (e.g., 128 antennas) is deployed to transmit high volume traffic. This clearly indicates that computation power optimization can play a major role in the energy efficiency of small cell networks. PMID:28757670
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Further reduction of minimal first-met bad markings for the computationally efficient synthesis of a maximally permissive controller

NASA Astrophysics Data System (ADS)

Liu, GaiYun; Chao, Daniel Yuh

2015-08-01

To date, research on the supervisor design for flexible manufacturing systems focuses on speeding up the computation of optimal (maximally permissive) liveness-enforcing controllers. Recent deadlock prevention policies for systems of simple sequential processes with resources (S3PR) reduce the computation burden by considering only the minimal portion of all first-met bad markings (FBMs). Maximal permissiveness is ensured by not forbidding any live state. This paper proposes a method to further reduce the size of minimal set of FBMs to efficiently solve integer linear programming problems while maintaining maximal permissiveness using a vector-covering approach. This paper improves the previous work and achieves the simplest structure with the minimal number of monitors.
Projected role of advanced computational aerodynamic methods at the Lockheed-Georgia company

NASA Technical Reports Server (NTRS)

Lores, M. E.

1978-01-01

Experience with advanced computational methods being used at the Lockheed-Georgia Company to aid in the evaluation and design of new and modified aircraft indicates that large and specialized computers will be needed to make advanced three-dimensional viscous aerodynamic computations practical. The Numerical Aerodynamic Simulation Facility should be used to provide a tool for designing better aerospace vehicles while at the same time reducing development costs by performing computations using Navier-Stokes equations solution algorithms and permitting less sophisticated but nevertheless complex calculations to be made efficiently. Configuration definition procedures and data output formats can probably best be defined in cooperation with industry, therefore, the computer should handle many remote terminals efficiently. The capability of transferring data to and from other computers needs to be provided. Because of the significant amount of input and output associated with 3-D viscous flow calculations and because of the exceedingly fast computation speed envisioned for the computer, special attention should be paid to providing rapid, diversified, and efficient input and output.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering

NASA Technical Reports Server (NTRS)

Rizvi, Farheen

2013-01-01

The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Efficient integration method for fictitious domain approaches

NASA Astrophysics Data System (ADS)

Duczek, Sascha; Gabbert, Ulrich

2015-10-01

In the current article, we present an efficient and accurate numerical method for the integration of the system matrices in fictitious domain approaches such as the finite cell method (FCM). In the framework of the FCM, the physical domain is embedded in a geometrically larger domain of simple shape which is discretized using a regular Cartesian grid of cells. Therefore, a spacetree-based adaptive quadrature technique is normally deployed to resolve the geometry of the structure. Depending on the complexity of the structure under investigation this method accounts for most of the computational effort. To reduce the computational costs for computing the system matrices an efficient quadrature scheme based on the divergence theorem (Gauß-Ostrogradsky theorem) is proposed. Using this theorem the dimension of the integral is reduced by one, i.e. instead of solving the integral for the whole domain only its contour needs to be considered. In the current paper, we present the general principles of the integration method and its implementation. The results to several two-dimensional benchmark problems highlight its properties. The efficiency of the proposed method is compared to conventional spacetree-based integration techniques.
Computationally Efficient Multiconfigurational Reactive Molecular Dynamics

PubMed Central

Yamashita, Takefumi; Peng, Yuxing; Knight, Chris; Voth, Gregory A.

2012-01-01

It is a computationally demanding task to explicitly simulate the electronic degrees of freedom in a system to observe the chemical transformations of interest, while at the same time sampling the time and length scales required to converge statistical properties and thus reduce artifacts due to initial conditions, finite-size effects, and limited sampling. One solution that significantly reduces the computational expense consists of molecular models in which effective interactions between particles govern the dynamics of the system. If the interaction potentials in these models are developed to reproduce calculated properties from electronic structure calculations and/or ab initio molecular dynamics simulations, then one can calculate accurate properties at a fraction of the computational cost. Multiconfigurational algorithms model the system as a linear combination of several chemical bonding topologies to simulate chemical reactions, also sometimes referred to as “multistate”. These algorithms typically utilize energy and force calculations already found in popular molecular dynamics software packages, thus facilitating their implementation without significant changes to the structure of the code. However, the evaluation of energies and forces for several bonding topologies per simulation step can lead to poor computational efficiency if redundancy is not efficiently removed, particularly with respect to the calculation of long-ranged Coulombic interactions. This paper presents accurate approximations (effective long-range interaction and resulting hybrid methods) and multiple-program parallelization strategies for the efficient calculation of electrostatic interactions in reactive molecular simulations. PMID:25100924
Expanding Software Productivity and Power while Reducing Costs.

ERIC Educational Resources Information Center

Winer, Ellen N.

1988-01-01

Microcomputer efficiency and software economy can be achieved through file transfer and data sharing. Costs can be reduced by purchasing computer systems that allow for expansion and portability of data. (MLF)
An approximate solution to improve computational efficiency of impedance-type payload load prediction

NASA Technical Reports Server (NTRS)

White, C. W.

1981-01-01

The computational efficiency of the impedance type loads prediction method was studied. Three goals were addressed: devise a method to make the impedance method operate more efficiently in the computer; assess the accuracy and convenience of the method for determining the effect of design changes; and investigate the use of the method to identify design changes for reduction of payload loads. The method is suitable for calculation of dynamic response in either the frequency or time domain. It is concluded that: the choice of an orthogonal coordinate system will allow the impedance method to operate more efficiently in the computer; the approximate mode impedance technique is adequate for determining the effect of design changes, and is applicable for both statically determinate and statically indeterminate payload attachments; and beneficial design changes to reduce payload loads can be identified by the combined application of impedance techniques and energy distribution review techniques.
Multiple multicontrol unitary operations: Implementation and applications

NASA Astrophysics Data System (ADS)

Lin, Qing

2018-04-01

The efficient implementation of computational tasks is critical to quantum computations. In quantum circuits, multicontrol unitary operations are important components. Here, we present an extremely efficient and direct approach to multiple multicontrol unitary operations without decomposition to CNOT and single-photon gates. With the proposed approach, the necessary two-photon operations could be reduced from O( n 3) with the traditional decomposition approach to O( n), which will greatly relax the requirements and make large-scale quantum computation feasible. Moreover, we propose the potential application to the ( n- k)-uniform hypergraph state.
77 FR 12528 - Amendments to Commission's Rules of Practice and Procedure-Subparts E and L

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-01

..., he advocates the use of a cloud computing system in which documents can be filed giving multiple... concerns. Barillo believes that cloud computing would streamline efficiency and reduce staff labor in...
Energy 101: Energy Efficient Data Centers

ScienceCinema

None

2018-04-16

Data centers provide mission-critical computing functions vital to the daily operation of top U.S. economic, scientific, and technological organizations. These data centers consume large amounts of energy to run and maintain their computer systems, servers, and associated high-performance componentsâup to 3% of all U.S. electricity powers data centers. And as more information comes online, data centers will consume even more energy. Data centers can become more energy efficient by incorporating features like power-saving "stand-by" modes, energy monitoring software, and efficient cooling systems instead of energy-intensive air conditioners. These and other efficiency improvements to data centers can produce significant energy savings, reduce the load on the electric grid, and help protect the nation by increasing the reliability of critical computer operations.
Computationally efficient method for Fourier transform of highly chirped pulses for laser and parametric amplifier modeling.

PubMed

Andrianov, Alexey; Szabo, Aron; Sergeev, Alexander; Kim, Arkady; Chvykov, Vladimir; Kalashnikov, Mikhail

2016-11-14

We developed an improved approach to calculate the Fourier transform of signals with arbitrary large quadratic phase which can be efficiently implemented in numerical simulations utilizing Fast Fourier transform. The proposed algorithm significantly reduces the computational cost of Fourier transform of a highly chirped and stretched pulse by splitting it into two separate transforms of almost transform limited pulses, thereby reducing the required grid size roughly by a factor of the pulse stretching. The application of our improved Fourier transform algorithm in the split-step method for numerical modeling of CPA and OPCPA shows excellent agreement with standard algorithms.
Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves

NASA Astrophysics Data System (ADS)

Qiu, Shi; Liu, Kuang; Eliasson, Veronica

2016-10-01

Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.
High-Performance Computing Data Center | Computational Science | NREL

Science.gov Websites

liquid cooling to achieve its very low PUE, then captures and reuses waste heat as the primary heating dry cooler that uses refrigerant in a passive cycle to dissipate heat-is reducing onsite water Measuring efficiency through PUE Warm-water liquid cooling Re-using waste heat from computing components
High accurate interpolation of NURBS tool path for CNC machine tools

NASA Astrophysics Data System (ADS)

Liu, Qiang; Liu, Huan; Yuan, Songmei

2016-09-01

Feedrate fluctuation caused by approximation errors of interpolation methods has great effects on machining quality in NURBS interpolation, but few methods can efficiently eliminate or reduce it to a satisfying level without sacrificing the computing efficiency at present. In order to solve this problem, a high accurate interpolation method for NURBS tool path is proposed. The proposed method can efficiently reduce the feedrate fluctuation by forming a quartic equation with respect to the curve parameter increment, which can be efficiently solved by analytic methods in real-time. Theoretically, the proposed method can totally eliminate the feedrate fluctuation for any 2nd degree NURBS curves and can interpolate 3rd degree NURBS curves with minimal feedrate fluctuation. Moreover, a smooth feedrate planning algorithm is also proposed to generate smooth tool motion with considering multiple constraints and scheduling errors by an efficient planning strategy. Experiments are conducted to verify the feasibility and applicability of the proposed method. This research presents a novel NURBS interpolation method with not only high accuracy but also satisfying computing efficiency.
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

NASA Astrophysics Data System (ADS)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

NASA Lewis Stirling SPRE testing and analysis with reduced number of cooler tubes

NASA Technical Reports Server (NTRS)

Wong, Wayne A.; Cairelli, James E.; Swec, Diane M.; Doeberling, Thomas J.; Lakatos, Thomas F.; Madi, Frank J.

1992-01-01

Free-piston Stirling power converters are candidates for high capacity space power applications. The Space Power Research Engine (SPRE), a free-piston Stirling engine coupled with a linear alternator, is being tested at the NASA Lewis Research Center in support of the Civil Space Technology Initiative. The SPRE is used as a test bed for evaluating converter modifications which have the potential to improve the converter performance and for validating computer code predictions. Reducing the number of cooler tubes on the SPRE has been identified as a modification with the potential to significantly improve power and efficiency. Experimental tests designed to investigate the effects of reducing the number of cooler tubes on converter power, efficiency and dynamics are described. Presented are test results from the converter operating with a reduced number of cooler tubes and comparisons between this data and both baseline test data and computer code predictions.
Development of efficient time-evolution method based on three-term recurrence relation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Akama, Tomoko, E-mail: a.tomo---s-b-l-r@suou.waseda.jp; Kobayashi, Osamu; Nanbu, Shinkoh, E-mail: shinkoh.nanbu@sophia.ac.jp

The advantage of the real-time (RT) propagation method is a direct solution of the time-dependent Schrödinger equation which describes frequency properties as well as all dynamics of a molecular system composed of electrons and nuclei in quantum physics and chemistry. Its applications have been limited by computational feasibility, as the evaluation of the time-evolution operator is computationally demanding. In this article, a new efficient time-evolution method based on the three-term recurrence relation (3TRR) was proposed to reduce the time-consuming numerical procedure. The basic formula of this approach was derived by introducing a transformation of the operator using the arcsine function.more » Since this operator transformation causes transformation of time, we derived the relation between original and transformed time. The formula was adapted to assess the performance of the RT time-dependent Hartree-Fock (RT-TDHF) method and the time-dependent density functional theory. Compared to the commonly used fourth-order Runge-Kutta method, our new approach decreased computational time of the RT-TDHF calculation by about factor of four, showing the 3TRR formula to be an efficient time-evolution method for reducing computational cost.« less
Efficient volume computation for three-dimensional hexahedral cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dukowicz, J.K.

1988-02-01

Currently, algorithms for computing the volume of hexahedral cells with ''ruled'' surfaces require a minimum of 122 FLOPs (floating point operations) per cell. A new algorithm is described which reduces the operation count to 57 FLOPs per cell. copyright 1988 Academic Press, Inc.
Efficient algorithms for single-axis attitude estimation

NASA Technical Reports Server (NTRS)

Shuster, M. D.

1981-01-01

The computationally efficient algorithms determine attitude from the measurement of art lengths and dihedral angles. The dependence of these algorithms on the solution of trigonometric equations was reduced. Both single time and batch estimators are presented along with the covariance analysis of each algorithm.
Information processing using a single dynamical node as complex system

PubMed Central

Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.

2011-01-01

Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110
On real-space Density Functional Theory for non-orthogonal crystal systems: Kronecker product formulation of the kinetic energy operator

NASA Astrophysics Data System (ADS)

Sharma, Abhiraj; Suryanarayana, Phanish

2018-05-01

We present an accurate and efficient real-space Density Functional Theory (DFT) framework for the ab initio study of non-orthogonal crystal systems. Specifically, employing a local reformulation of the electrostatics, we develop a novel Kronecker product formulation of the real-space kinetic energy operator that significantly reduces the number of operations associated with the Laplacian-vector multiplication, the dominant cost in practical computations. In particular, we reduce the scaling with respect to finite-difference order from quadratic to linear, thereby significantly bridging the gap in computational cost between non-orthogonal and orthogonal systems. We verify the accuracy and efficiency of the proposed methodology through selected examples.
Estimating Skin Cancer Risk: Evaluating Mobile Computer-Adaptive Testing.

PubMed

Djaja, Ngadiman; Janda, Monika; Olsen, Catherine M; Whiteman, David C; Chien, Tsair-Wei

2016-01-22

Response burden is a major detriment to questionnaire completion rates. Computer adaptive testing may offer advantages over non-adaptive testing, including reduction of numbers of items required for precise measurement. Our aim was to compare the efficiency of non-adaptive (NAT) and computer adaptive testing (CAT) facilitated by Partial Credit Model (PCM)-derived calibration to estimate skin cancer risk. We used a random sample from a population-based Australian cohort study of skin cancer risk (N=43,794). All 30 items of the skin cancer risk scale were calibrated with the Rasch PCM. A total of 1000 cases generated following a normal distribution (mean [SD] 0 [1]) were simulated using three Rasch models with three fixed-item (dichotomous, rating scale, and partial credit) scenarios, respectively. We calculated the comparative efficiency and precision of CAT and NAT (shortening of questionnaire length and the count difference number ratio less than 5% using independent t tests). We found that use of CAT led to smaller person standard error of the estimated measure than NAT, with substantially higher efficiency but no loss of precision, reducing response burden by 48%, 66%, and 66% for dichotomous, Rating Scale Model, and PCM models, respectively. CAT-based administrations of the skin cancer risk scale could substantially reduce participant burden without compromising measurement precision. A mobile computer adaptive test was developed to help people efficiently assess their skin cancer risk.
Inverse kinematics of a dual linear actuator pitch/roll heliostat

NASA Astrophysics Data System (ADS)

Freeman, Joshua; Shankar, Balakrishnan; Sundaram, Ganesh

2017-06-01

This work presents a simple, computationally efficient inverse kinematics solution for a pitch/roll heliostat using two linear actuators. The heliostat design and kinematics have been developed, modeled and tested using computer simulation software. A physical heliostat prototype was fabricated to validate the theoretical computations and data. Pitch/roll heliostats have numerous advantages including reduced cost potential and reduced space requirements, with a primary disadvantage being the significantly more complicated kinematics, which are solved here. Novel methods are applied to simplify the inverse kinematics problem which could be applied to other similar problems.
Enhancing battery efficiency for pervasive health-monitoring systems based on electronic textiles.

PubMed

Zheng, Nenggan; Wu, Zhaohui; Lin, Man; Yang, Laurence Tianruo

2010-03-01

Electronic textiles are regarded as one of the most important computation platforms for future computer-assisted health-monitoring applications. In these novel systems, multiple batteries are used in order to prolong their operational lifetime, which is a significant metric for system usability. However, due to the nonlinear features of batteries, computing systems with multiple batteries cannot achieve the same battery efficiency as those powered by a monolithic battery of equal capacity. In this paper, we propose an algorithm aiming to maximize battery efficiency globally for the computer-assisted health-care systems with multiple batteries. Based on an accurate analytical battery model, the concept of weighted battery fatigue degree is introduced and the novel battery-scheduling algorithm called predicted weighted fatigue degree least first (PWFDLF) is developed. Besides, we also discuss our attempts during search PWFDLF: a weighted round-robin (WRR) and a greedy algorithm achieving highest local battery efficiency, which reduces to the sequential discharging policy. Evaluation results show that a considerable improvement in battery efficiency can be obtained by PWFDLF under various battery configurations and current profiles compared to conventional sequential and WRR discharging policies.
An efficient and accurate 3D displacements tracking strategy for digital volume correlation

NASA Astrophysics Data System (ADS)

Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles

2014-07-01

Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
Control Synthesis of Discrete-Time T-S Fuzzy Systems: Reducing the Conservatism Whilst Alleviating the Computational Burden.

PubMed

Xie, Xiangpeng; Yue, Dong; Zhang, Huaguang; Peng, Chen

2017-09-01

The augmented multi-indexed matrix approach acts as a powerful tool in reducing the conservatism of control synthesis of discrete-time Takagi-Sugeno fuzzy systems. However, its computational burden is sometimes too heavy as a tradeoff. Nowadays, reducing the conservatism whilst alleviating the computational burden becomes an ideal but very challenging problem. This paper is toward finding an efficient way to achieve one of satisfactory answers. Different from the augmented multi-indexed matrix approach in the literature, we aim to design a more efficient slack variable approach under a general framework of homogenous matrix polynomials. Thanks to the introduction of a new extended representation for homogeneous matrix polynomials, related matrices with the same coefficient are collected together into one sole set and thus those redundant terms of the augmented multi-indexed matrix approach can be removed, i.e., the computational burden can be alleviated in this paper. More importantly, due to the fact that more useful information is involved into control design, the conservatism of the proposed approach as well is less than the counterpart of the augmented multi-indexed matrix approach. Finally, numerical experiments are given to show the effectiveness of the proposed approach.
Alternative Modal Basis Selection Procedures For Reduced-Order Nonlinear Random Response Simulation

NASA Technical Reports Server (NTRS)

Przekop, Adam; Guo, Xinyun; Rizi, Stephen A.

2012-01-01

Three procedures to guide selection of an efficient modal basis in a nonlinear random response analysis are examined. One method is based only on proper orthogonal decomposition, while the other two additionally involve smooth orthogonal decomposition. Acoustic random response problems are employed to assess the performance of the three modal basis selection approaches. A thermally post-buckled beam exhibiting snap-through behavior, a shallowly curved arch in the auto-parametric response regime and a plate structure are used as numerical test articles. The results of a computationally taxing full-order analysis in physical degrees of freedom are taken as the benchmark for comparison with the results from the three reduced-order analyses. For the cases considered, all three methods are shown to produce modal bases resulting in accurate and computationally efficient reduced-order nonlinear simulations.
Efficient biprediction decision scheme for fast high efficiency video coding encoding

NASA Astrophysics Data System (ADS)

Park, Sang-hyo; Lee, Seung-ho; Jang, Euee S.; Jun, Dongsan; Kang, Jung-Won

2016-11-01

An efficient biprediction decision scheme of high efficiency video coding (HEVC) is proposed for fast-encoding applications. For low-delay video applications, bidirectional prediction can be used to increase compression performance efficiently with previous reference frames. However, at the same time, the computational complexity of the HEVC encoder is significantly increased due to the additional biprediction search. Although a some research has attempted to reduce this complexity, whether the prediction is strongly related to both motion complexity and prediction modes in a coding unit has not yet been investigated. A method that avoids most compression-inefficient search points is proposed so that the computational complexity of the motion estimation process can be dramatically decreased. To determine if biprediction is critical, the proposed method exploits the stochastic correlation of the context of prediction units (PUs): the direction of a PU and the accuracy of a motion vector. Through experimental results, the proposed method showed that the time complexity of biprediction can be reduced to 30% on average, outperforming existing methods in view of encoding time, number of function calls, and memory access.
Efficient FIR Filter Implementations for Multichannel BCIs Using Xilinx System Generator.

PubMed

Ghani, Usman; Wasim, Muhammad; Khan, Umar Shahbaz; Mubasher Saleem, Muhammad; Hassan, Ali; Rashid, Nasir; Islam Tiwana, Mohsin; Hamza, Amir; Kashif, Amir

2018-01-01

Background . Brain computer interface (BCI) is a combination of software and hardware communication protocols that allow brain to control external devices. Main purpose of BCI controlled external devices is to provide communication medium for disabled persons. Now these devices are considered as a new way to rehabilitate patients with impunities. There are certain potentials present in electroencephalogram (EEG) that correspond to specific event. Main issue is to detect such event related potentials online in such a low signal to noise ratio (SNR). In this paper we propose a method that will facilitate the concept of online processing by providing an efficient filtering implementation in a hardware friendly environment by switching to finite impulse response (FIR). Main focus of this research is to minimize latency and computational delay of preprocessing related to any BCI application. Four different finite impulse response (FIR) implementations along with large Laplacian filter are implemented in Xilinx System Generator. Efficiency of 25% is achieved in terms of reduced number of coefficients and multiplications which in turn reduce computational delays accordingly.
Reduced basis ANOVA methods for partial differential equations with high-dimensional random inputs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, Qifeng, E-mail: liaoqf@shanghaitech.edu.cn; Lin, Guang, E-mail: guanglin@purdue.edu

2016-07-15

In this paper we present a reduced basis ANOVA approach for partial deferential equations (PDEs) with random inputs. The ANOVA method combined with stochastic collocation methods provides model reduction in high-dimensional parameter space through decomposing high-dimensional inputs into unions of low-dimensional inputs. In this work, to further reduce the computational cost, we investigate spatial low-rank structures in the ANOVA-collocation method, and develop efficient spatial model reduction techniques using hierarchically generated reduced bases. We present a general mathematical framework of the methodology, validate its accuracy and demonstrate its efficiency with numerical experiments.
Multi-element least square HDMR methods and their applications for stochastic multiscale model reduction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Xinping, E-mail: exping@126.com

Stochastic multiscale modeling has become a necessary approach to quantify uncertainty and characterize multiscale phenomena for many practical problems such as flows in stochastic porous media. The numerical treatment of the stochastic multiscale models can be very challengeable as the existence of complex uncertainty and multiple physical scales in the models. To efficiently take care of the difficulty, we construct a computational reduced model. To this end, we propose a multi-element least square high-dimensional model representation (HDMR) method, through which the random domain is adaptively decomposed into a few subdomains, and a local least square HDMR is constructed in eachmore » subdomain. These local HDMRs are represented by a finite number of orthogonal basis functions defined in low-dimensional random spaces. The coefficients in the local HDMRs are determined using least square methods. We paste all the local HDMR approximations together to form a global HDMR approximation. To further reduce computational cost, we present a multi-element reduced least-square HDMR, which improves both efficiency and approximation accuracy in certain conditions. To effectively treat heterogeneity properties and multiscale features in the models, we integrate multiscale finite element methods with multi-element least-square HDMR for stochastic multiscale model reduction. This approach significantly reduces the original model's complexity in both the resolution of the physical space and the high-dimensional stochastic space. We analyze the proposed approach, and provide a set of numerical experiments to demonstrate the performance of the presented model reduction techniques. - Highlights: • Multi-element least square HDMR is proposed to treat stochastic models. • Random domain is adaptively decomposed into some subdomains to obtain adaptive multi-element HDMR. • Least-square reduced HDMR is proposed to enhance computation efficiency and approximation accuracy in certain conditions. • Integrating MsFEM and multi-element least square HDMR can significantly reduce computation complexity.« less
Computer-assisted coding and clinical documentation: first things first.

PubMed

Tully, Melinda; Carmichael, Angela

2012-10-01

Computer-assisted coding tools have the potential to drive improvements in seven areas: Transparency of coding. Productivity (generally by 20 to 25 percent for inpatient claims). Accuracy (by improving specificity of documentation). Cost containment (by reducing overtime expenses, audit fees, and denials). Compliance. Efficiency. Consistency.
Probabilistic simulation of multi-scale composite behavior

NASA Technical Reports Server (NTRS)

Liaw, D. G.; Shiao, M. C.; Singhal, S. N.; Chamis, Christos C.

1993-01-01

A methodology is developed to computationally assess the probabilistic composite material properties at all composite scale levels due to the uncertainties in the constituent (fiber and matrix) properties and in the fabrication process variables. The methodology is computationally efficient for simulating the probability distributions of material properties. The sensitivity of the probabilistic composite material property to each random variable is determined. This information can be used to reduce undesirable uncertainties in material properties at the macro scale of the composite by reducing the uncertainties in the most influential random variables at the micro scale. This methodology was implemented into the computer code PICAN (Probabilistic Integrated Composite ANalyzer). The accuracy and efficiency of this methodology are demonstrated by simulating the uncertainties in the material properties of a typical laminate and comparing the results with the Monte Carlo simulation method. The experimental data of composite material properties at all scales fall within the scatters predicted by PICAN.
A variational eigenvalue solver on a photonic quantum processor

PubMed Central

Peruzzo, Alberto; McClean, Jarrod; Shadbolt, Peter; Yung, Man-Hong; Zhou, Xiao-Qi; Love, Peter J.; Aspuru-Guzik, Alán; O’Brien, Jeremy L.

2014-01-01

Quantum computers promise to efficiently solve important problems that are intractable on a conventional computer. For quantum systems, where the physical dimension grows exponentially, finding the eigenvalues of certain operators is one such intractable problem and remains a fundamental challenge. The quantum phase estimation algorithm efficiently finds the eigenvalue of a given eigenvector but requires fully coherent evolution. Here we present an alternative approach that greatly reduces the requirements for coherent evolution and combine this method with a new approach to state preparation based on ansätze and classical optimization. We implement the algorithm by combining a highly reconfigurable photonic quantum processor with a conventional computer. We experimentally demonstrate the feasibility of this approach with an example from quantum chemistry—calculating the ground-state molecular energy for He–H+. The proposed approach drastically reduces the coherence time requirements, enhancing the potential of quantum resources available today and in the near future. PMID:25055053
Image detection and compression for memory efficient system analysis

NASA Astrophysics Data System (ADS)

Bayraktar, Mustafa

2015-02-01

The advances in digital signal processing have been progressing towards efficient use of memory and processing. Both of these factors can be utilized efficiently by using feasible techniques of image storage by computing the minimum information of image which will enhance computation in later processes. Scale Invariant Feature Transform (SIFT) can be utilized to estimate and retrieve of an image. In computer vision, SIFT can be implemented to recognize the image by comparing its key features from SIFT saved key point descriptors. The main advantage of SIFT is that it doesn't only remove the redundant information from an image but also reduces the key points by matching their orientation and adding them together in different windows of image [1]. Another key property of this approach is that it works on highly contrasted images more efficiently because it`s design is based on collecting key points from the contrast shades of image.

HPC on Competitive Cloud Resources

NASA Astrophysics Data System (ADS)

Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff

Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.
The reduced basis method for the electric field integral equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fares, M., E-mail: fares@cerfacs.f; Hesthaven, J.S., E-mail: Jan_Hesthaven@Brown.ed; Maday, Y., E-mail: maday@ann.jussieu.f

We introduce the reduced basis method (RBM) as an efficient tool for parametrized scattering problems in computational electromagnetics for problems where field solutions are computed using a standard Boundary Element Method (BEM) for the parametrized electric field integral equation (EFIE). This combination enables an algorithmic cooperation which results in a two step procedure. The first step consists of a computationally intense assembling of the reduced basis, that needs to be effected only once. In the second step, we compute output functionals of the solution, such as the Radar Cross Section (RCS), independently of the dimension of the discretization space, formore » many different parameter values in a many-query context at very little cost. Parameters include the wavenumber, the angle of the incident plane wave and its polarization.« less
Unified treatment of microscopic boundary conditions and efficient algorithms for estimating tangent operators of the homogenized behavior in the computational homogenization method

NASA Astrophysics Data System (ADS)

Nguyen, Van-Dung; Wu, Ling; Noels, Ludovic

2017-03-01

This work provides a unified treatment of arbitrary kinds of microscopic boundary conditions usually considered in the multi-scale computational homogenization method for nonlinear multi-physics problems. An efficient procedure is developed to enforce the multi-point linear constraints arising from the microscopic boundary condition either by the direct constraint elimination or by the Lagrange multiplier elimination methods. The macroscopic tangent operators are computed in an efficient way from a multiple right hand sides linear system whose left hand side matrix is the stiffness matrix of the microscopic linearized system at the converged solution. The number of vectors at the right hand side is equal to the number of the macroscopic kinematic variables used to formulate the microscopic boundary condition. As the resolution of the microscopic linearized system often follows a direct factorization procedure, the computation of the macroscopic tangent operators is then performed using this factorized matrix at a reduced computational time.
High-efficiency photorealistic computer-generated holograms based on the backward ray-tracing technique

NASA Astrophysics Data System (ADS)

Wang, Yuan; Chen, Zhidong; Sang, Xinzhu; Li, Hui; Zhao, Linmin

2018-03-01

Holographic displays can provide the complete optical wave field of a three-dimensional (3D) scene, including the depth perception. However, it often takes a long computation time to produce traditional computer-generated holograms (CGHs) without more complex and photorealistic rendering. The backward ray-tracing technique is able to render photorealistic high-quality images, which noticeably reduce the computation time achieved from the high-degree parallelism. Here, a high-efficiency photorealistic computer-generated hologram method is presented based on the ray-tracing technique. Rays are parallelly launched and traced under different illuminations and circumstances. Experimental results demonstrate the effectiveness of the proposed method. Compared with the traditional point cloud CGH, the computation time is decreased to 24 s to reconstruct a 3D object of 100 ×100 rays with continuous depth change.
A Reduced-Order Model for Efficient Simulation of Synthetic Jet Actuators

NASA Technical Reports Server (NTRS)

Yamaleev, Nail K.; Carpenter, Mark H.

2003-01-01

A new reduced-order model of multidimensional synthetic jet actuators that combines the accuracy and conservation properties of full numerical simulation methods with the efficiency of simplified zero-order models is proposed. The multidimensional actuator is simulated by solving the time-dependent compressible quasi-1-D Euler equations, while the diaphragm is modeled as a moving boundary. The governing equations are approximated with a fourth-order finite difference scheme on a moving mesh such that one of the mesh boundaries coincides with the diaphragm. The reduced-order model of the actuator has several advantages. In contrast to the 3-D models, this approach provides conservation of mass, momentum, and energy. Furthermore, the new method is computationally much more efficient than the multidimensional Navier-Stokes simulation of the actuator cavity flow, while providing practically the same accuracy in the exterior flowfield. The most distinctive feature of the present model is its ability to predict the resonance characteristics of synthetic jet actuators; this is not practical when using the 3-D models because of the computational cost involved. Numerical results demonstrating the accuracy of the new reduced-order model and its limitations are presented.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

PubMed Central

Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

PubMed

Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
DoDs Efforts to Consolidate Data Centers Need Improvement

DTIC Science & Technology

2016-03-29

Consolidation Initiative, February 26, 2010. 3 Green IT minimizes negative environmental impact of IT operations by ensuring that computers and computer-related...objectives for consolidating data centers. DoD’s objectives were to: • reduce cost; • reduce environmental impact ; • improve efficiency and service levels...number of DoD data centers. Finding A DODIG-2016-068 │ 7 information in DCIM, the DoD CIO did not confirm whether those changes would impact DoD’s
Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.

PubMed

Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi

2018-01-01

In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Computational efficiency improvements for image colorization

NASA Astrophysics Data System (ADS)

Yu, Chao; Sharma, Gaurav; Aly, Hussein

2013-03-01

We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
The Value Proposition in Institutional Repositories

ERIC Educational Resources Information Center

Blythe, Erv; Chachra, Vinod

2005-01-01

In the education and research arena of the late 1970s and early 1980s, a struggle developed between those who advocated centralized, mainframe-based computing and those who advocated distributed computing. Ultimately, the debate reduced to whether economies of scale or economies of scope are more important to the effectiveness and efficiency of…
The application of dynamic programming in production planning

NASA Astrophysics Data System (ADS)

Wu, Run

2017-05-01

Nowadays, with the popularity of the computers, various industries and fields are widely applying computer information technology, which brings about huge demand for a variety of application software. In order to develop software meeting various needs with most economical cost and best quality, programmers must design efficient algorithms. A superior algorithm can not only soul up one thing, but also maximize the benefits and generate the smallest overhead. As one of the common algorithms, dynamic programming algorithms are used to solving problems with some sort of optimal properties. When solving problems with a large amount of sub-problems that needs repetitive calculations, the ordinary sub-recursive method requires to consume exponential time, and dynamic programming algorithm can reduce the time complexity of the algorithm to the polynomial level, according to which we can conclude that dynamic programming algorithm is a very efficient compared to other algorithms reducing the computational complexity and enriching the computational results. In this paper, we expound the concept, basic elements, properties, core, solving steps and difficulties of the dynamic programming algorithm besides, establish the dynamic programming model of the production planning problem.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Survey of MapReduce frame operation in bioinformatics.

PubMed

Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

2014-07-01

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Money for Research, Not for Energy Bills: Finding Energy and Cost Savings in High Performance Computer Facility Designs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drewmark Communications; Sartor, Dale; Wilson, Mark

2010-07-01

High-performance computing facilities in the United States consume an enormous amount of electricity, cutting into research budgets and challenging public- and private-sector efforts to reduce energy consumption and meet environmental goals. However, these facilities can greatly reduce their energy demand through energy-efficient design of the facility itself. Using a case study of a facility under design, this article discusses strategies and technologies that can be used to help achieve energy reductions.
Determination of aerodynamic sensitivity coefficients based on the three-dimensional full potential equation

NASA Technical Reports Server (NTRS)

Elbanna, Hesham M.; Carlson, Leland A.

1992-01-01

The quasi-analytical approach is applied to the three-dimensional full potential equation to compute wing aerodynamic sensitivity coefficients in the transonic regime. Symbolic manipulation is used to reduce the effort associated with obtaining the sensitivity equations, and the large sensitivity system is solved using 'state of the art' routines. Results are compared to those obtained by the direct finite difference approach and both methods are evaluated to determine their computational accuracy and efficiency. The quasi-analytical approach is shown to be accurate and efficient for large aerodynamic systems.
Optimal thickness of silicon membranes to achieve maximum thermoelectric efficiency: A first principles study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mangold, Claudia; Neogi, Sanghamitra; Max Planck Institut für Polymerforschung, Ackermannweg 10, D-55128 Mainz

2016-08-01

Silicon nanostructures with reduced dimensionality, such as nanowires, membranes, and thin films, are promising thermoelectric materials, as they exhibit considerably reduced thermal conductivity. Here, we utilize density functional theory and Boltzmann transport equation to compute the electronic properties of ultra-thin crystalline silicon membranes with thickness between 1 and 12 nm. We predict that an optimal thickness of ∼7 nm maximizes the thermoelectric figure of merit of membranes with native oxide surface layers. Further thinning of the membranes, although attainable in experiments, reduces the electrical conductivity and worsens the thermoelectric efficiency.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images.

PubMed

Du, Xiaogang; Dang, Jianwu; Wang, Yangping; Wang, Song; Lei, Tao

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).
Large-scale inverse model analyses employing fast randomized data reduction

NASA Astrophysics Data System (ADS)

Lin, Youzuo; Le, Ellen B.; O'Malley, Daniel; Vesselinov, Velimir V.; Bui-Thanh, Tan

2017-08-01

When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called "sketching" matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.

Alternative Modal Basis Selection Procedures for Nonlinear Random Response Simulation

NASA Technical Reports Server (NTRS)

Przekop, Adam; Guo, Xinyun; Rizzi, Stephen A.

2010-01-01

Three procedures to guide selection of an efficient modal basis in a nonlinear random response analysis are examined. One method is based only on proper orthogonal decomposition, while the other two additionally involve smooth orthogonal decomposition. Acoustic random response problems are employed to assess the performance of the three modal basis selection approaches. A thermally post-buckled beam exhibiting snap-through behavior, a shallowly curved arch in the auto-parametric response regime and a plate structure are used as numerical test articles. The results of the three reduced-order analyses are compared with the results of the computationally taxing simulation in the physical degrees of freedom. For the cases considered, all three methods are shown to produce modal bases resulting in accurate and computationally efficient reduced-order nonlinear simulations.
Sensitivity Analysis for Coupled Aero-structural Systems

NASA Technical Reports Server (NTRS)

Giunta, Anthony A.

1999-01-01

A novel method has been developed for calculating gradients of aerodynamic force and moment coefficients for an aeroelastic aircraft model. This method uses the Global Sensitivity Equations (GSE) to account for the aero-structural coupling, and a reduced-order modal analysis approach to condense the coupling bandwidth between the aerodynamic and structural models. Parallel computing is applied to reduce the computational expense of the numerous high fidelity aerodynamic analyses needed for the coupled aero-structural system. Good agreement is obtained between aerodynamic force and moment gradients computed with the GSE/modal analysis approach and the same quantities computed using brute-force, computationally expensive, finite difference approximations. A comparison between the computational expense of the GSE/modal analysis method and a pure finite difference approach is presented. These results show that the GSE/modal analysis approach is the more computationally efficient technique if sensitivity analysis is to be performed for two or more aircraft design parameters.
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

NASA Astrophysics Data System (ADS)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
The NASA Energy Conservation Program

NASA Technical Reports Server (NTRS)

Gaffney, G. P.

1977-01-01

Large energy-intensive research and test equipment at NASA installations is identified, and methods for reducing energy consumption outlined. However, some of the research facilities are involved in developing more efficient, fuel-conserving aircraft, and tradeoffs between immediate and long-term conservation may be necessary. Major programs for conservation include: computer-based systems to automatically monitor and control utility consumption; a steam-producing solid waste incinerator; and a computer-based cost analysis technique to engineer more efficient heating and cooling of buildings. Alternate energy sources in operation or under evaluation include: solar collectors; electric vehicles; and ultrasonically emulsified fuel to attain higher combustion efficiency. Management support, cooperative participation by employees, and effective reporting systems for conservation programs, are also discussed.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

NASA Astrophysics Data System (ADS)

Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

2018-04-01

The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
A low-power and high-quality implementation of the discrete cosine transformation

NASA Astrophysics Data System (ADS)

Heyne, B.; Götze, J.

2007-06-01

In this paper a computationally efficient and high-quality preserving DCT architecture is presented. It is obtained by optimizing the Loeffler DCT based on the Cordic algorithm. The computational complexity is reduced from 11 multiply and 29 add operations (Loeffler DCT) to 38 add and 16 shift operations (which is similar to the complexity of the binDCT). The experimental results show that the proposed DCT algorithm not only reduces the computational complexity significantly, but also retains the good transformation quality of the Loeffler DCT. Therefore, the proposed Cordic based Loeffler DCT is especially suited for low-power and high-quality CODECs in battery-based systems.
A heuristic re-mapping algorithm reducing inter-level communication in SAMR applications.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steensland, Johan; Ray, Jaideep

2003-07-01

This paper aims at decreasing execution time for large-scale structured adaptive mesh refinement (SAMR) applications by proposing a new heuristic re-mapping algorithm and experimentally showing its effectiveness in reducing inter-level communication. Tests were done for five different SAMR applications. The overall goal is to engineer a dynamically adaptive meta-partitioner capable of selecting and configuring the most appropriate partitioning strategy at run-time based on current system and application state. Such a metapartitioner can significantly reduce execution times for general SAMR applications. Computer simulations of physical phenomena are becoming increasingly popular as they constitute an important complement to real-life testing. In manymore » cases, such simulations are based on solving partial differential equations by numerical methods. Adaptive methods are crucial to efficiently utilize computer resources such as memory and CPU. But even with adaption, the simulations are computationally demanding and yield huge data sets. Thus parallelization and the efficient partitioning of data become issues of utmost importance. Adaption causes the workload to change dynamically, calling for dynamic (re-) partitioning to maintain efficient resource utilization. The proposed heuristic algorithm reduced inter-level communication substantially. Since the complexity of the proposed algorithm is low, this decrease comes at a relatively low cost. As a consequence, we draw the conclusion that the proposed re-mapping algorithm would be useful to lower overall execution times for many large SAMR applications. Due to its usefulness and its parameterization, the proposed algorithm would constitute a natural and important component of the meta-partitioner.« less
Efficient critical design load case identification for floating offshore wind turbines with a reduced nonlinear model

NASA Astrophysics Data System (ADS)

Matha, Denis; Sandner, Frank; Schlipf, David

2014-12-01

Design verification of wind turbines is performed by simulation of design load cases (DLC) defined in the IEC 61400-1 and -3 standards or equivalent guidelines. Due to the resulting large number of necessary load simulations, here a method is presented to reduce the computational effort for DLC simulations significantly by introducing a reduced nonlinear model and simplified hydro- and aerodynamics. The advantage of the formulation is that the nonlinear ODE system only contains basic mathematic operations and no iterations or internal loops which makes it very computationally efficient. Global turbine extreme and fatigue loads such as rotor thrust, tower base bending moment and mooring line tension, as well as platform motions are outputs of the model. They can be used to identify critical and less critical load situations to be then analysed with a higher fidelity tool and so speed up the design process. Results from these reduced model DLC simulations are presented and compared to higher fidelity models. Results in frequency and time domain as well as extreme and fatigue load predictions demonstrate that good agreement between the reduced and advanced model is achieved, allowing to efficiently exclude less critical DLC simulations, and to identify the most critical subset of cases for a given design. Additionally, the model is applicable for brute force optimization of floater control system parameters.
Molecular dynamics simulations in hybrid particle-continuum schemes: Pitfalls and caveats

NASA Astrophysics Data System (ADS)

Stalter, S.; Yelash, L.; Emamy, N.; Statt, A.; Hanke, M.; Lukáčová-Medvid'ová, M.; Virnau, P.

2018-03-01

Heterogeneous multiscale methods (HMM) combine molecular accuracy of particle-based simulations with the computational efficiency of continuum descriptions to model flow in soft matter liquids. In these schemes, molecular simulations typically pose a computational bottleneck, which we investigate in detail in this study. We find that it is preferable to simulate many small systems as opposed to a few large systems, and that a choice of a simple isokinetic thermostat is typically sufficient while thermostats such as Lowe-Andersen allow for simulations at elevated viscosity. We discuss suitable choices for time steps and finite-size effects which arise in the limit of very small simulation boxes. We also argue that if colloidal systems are considered as opposed to atomistic systems, the gap between microscopic and macroscopic simulations regarding time and length scales is significantly smaller. We propose a novel reduced-order technique for the coupling to the macroscopic solver, which allows us to approximate a non-linear stress-strain relation efficiently and thus further reduce computational effort of microscopic simulations.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
On computing the global time-optimal motions of robotic manipulators in the presence of obstacles

NASA Technical Reports Server (NTRS)

Shiller, Zvi; Dubowsky, Steven

1991-01-01

A method for computing the time-optimal motions of robotic manipulators is presented that considers the nonlinear manipulator dynamics, actuator constraints, joint limits, and obstacles. The optimization problem is reduced to a search for the time-optimal path in the n-dimensional position space. A small set of near-optimal paths is first efficiently selected from a grid, using a branch and bound search and a series of lower bound estimates on the traveling time along a given path. These paths are further optimized with a local path optimization to yield the global optimal solution. Obstacles are considered by eliminating the collision points from the tessellated space and by adding a penalty function to the motion time in the local optimization. The computational efficiency of the method stems from the reduced dimensionality of the searched spaced and from combining the grid search with a local optimization. The method is demonstrated in several examples for two- and six-degree-of-freedom manipulators with obstacles.
a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images

NASA Astrophysics Data System (ADS)

Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.

2015-07-01

Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
Aerothermodynamic Design Sensitivities for a Reacting Gas Flow Solver on an Unstructured Mesh Using a Discrete Adjoint Formulation

NASA Astrophysics Data System (ADS)

Thompson, Kyle Bonner

An algorithm is described to efficiently compute aerothermodynamic design sensitivities using a decoupled variable set. In a conventional approach to computing design sensitivities for reacting flows, the species continuity equations are fully coupled to the conservation laws for momentum and energy. In this algorithm, the species continuity equations are solved separately from the mixture continuity, momentum, and total energy equations. This decoupling simplifies the implicit system, so that the flow solver can be made significantly more efficient, with very little penalty on overall scheme robustness. Most importantly, the computational cost of the point implicit relaxation is shown to scale linearly with the number of species for the decoupled system, whereas the fully coupled approach scales quadratically. Also, the decoupled method significantly reduces the cost in wall time and memory in comparison to the fully coupled approach. This decoupled approach for computing design sensitivities with the adjoint system is demonstrated for inviscid flow in chemical non-equilibrium around a re-entry vehicle with a retro-firing annular nozzle. The sensitivities of the surface temperature and mass flow rate through the nozzle plenum are computed with respect to plenum conditions and verified against sensitivities computed using a complex-variable finite-difference approach. The decoupled scheme significantly reduces the computational time and memory required to complete the optimization, making this an attractive method for high-fidelity design of hypersonic vehicles.
Application of the MacCormack scheme to overland flow routing for high-spatial resolution distributed hydrological model

NASA Astrophysics Data System (ADS)

Zhang, Ling; Nan, Zhuotong; Liang, Xu; Xu, Yi; Hernández, Felipe; Li, Lianxia

2018-03-01

Although process-based distributed hydrological models (PDHMs) are evolving rapidly over the last few decades, their extensive applications are still challenged by the computational expenses. This study attempted, for the first time, to apply the numerically efficient MacCormack algorithm to overland flow routing in a representative high-spatial resolution PDHM, i.e., the distributed hydrology-soil-vegetation model (DHSVM), in order to improve its computational efficiency. The analytical verification indicates that both the semi and full versions of the MacCormack schemes exhibit robust numerical stability and are more computationally efficient than the conventional explicit linear scheme. The full-version outperforms the semi-version in terms of simulation accuracy when a same time step is adopted. The semi-MacCormack scheme was implemented into DHSVM (version 3.1.2) to solve the kinematic wave equations for overland flow routing. The performance and practicality of the enhanced DHSVM-MacCormack model was assessed by performing two groups of modeling experiments in the Mercer Creek watershed, a small urban catchment near Bellevue, Washington. The experiments show that DHSVM-MacCormack can considerably improve the computational efficiency without compromising the simulation accuracy of the original DHSVM model. More specifically, with the same computational environment and model settings, the computational time required by DHSVM-MacCormack can be reduced to several dozen minutes for a simulation period of three months (in contrast with one day and a half by the original DHSVM model) without noticeable sacrifice of the accuracy. The MacCormack scheme proves to be applicable to overland flow routing in DHSVM, which implies that it can be coupled into other PHDMs for watershed routing to either significantly improve their computational efficiency or to make the kinematic wave routing for high resolution modeling computational feasible.
Structural optimization with approximate sensitivities

NASA Technical Reports Server (NTRS)

Patnaik, S. N.; Hopkins, D. A.; Coroneos, R.

1994-01-01

Computational efficiency in structural optimization can be enhanced if the intensive computations associated with the calculation of the sensitivities, that is, gradients of the behavior constraints, are reduced. Approximation to gradients of the behavior constraints that can be generated with small amount of numerical calculations is proposed. Structural optimization with these approximate sensitivities produced correct optimum solution. Approximate gradients performed well for different nonlinear programming methods, such as the sequence of unconstrained minimization technique, method of feasible directions, sequence of quadratic programming, and sequence of linear programming. Structural optimization with approximate gradients can reduce by one third the CPU time that would otherwise be required to solve the problem with explicit closed-form gradients. The proposed gradient approximation shows potential to reduce intensive computation that has been associated with traditional structural optimization.
Effects of combined dimension reduction and tabulation on the simulations of a turbulent premixed flame using a large-eddy simulation/probability density function method

NASA Astrophysics Data System (ADS)

Kim, Jeonglae; Pope, Stephen B.

2014-05-01

A turbulent lean-premixed propane-air flame stabilised by a triangular cylinder as a flame-holder is simulated to assess the accuracy and computational efficiency of combined dimension reduction and tabulation of chemistry. The computational condition matches the Volvo rig experiments. For the reactive simulation, the Lagrangian Large-Eddy Simulation/Probability Density Function (LES/PDF) formulation is used. A novel two-way coupling approach between LES and PDF is applied to obtain resolved density to reduce its statistical fluctuations. Composition mixing is evaluated by the modified Interaction-by-Exchange with the Mean (IEM) model. A baseline case uses In Situ Adaptive Tabulation (ISAT) to calculate chemical reactions efficiently. Its results demonstrate good agreement with the experimental measurements in turbulence statistics, temperature, and minor species mass fractions. For dimension reduction, 11 and 16 represented species are chosen and a variant of Rate Controlled Constrained Equilibrium (RCCE) is applied in conjunction with ISAT to each case. All the quantities in the comparison are indistinguishable from the baseline results using ISAT only. The combined use of RCCE/ISAT reduces the computational time for chemical reaction by more than 50%. However, for the current turbulent premixed flame, chemical reaction takes only a minor portion of the overall computational cost, in contrast to non-premixed flame simulations using LES/PDF, presumably due to the restricted manifold of purely premixed flame in the composition space. Instead, composition mixing is the major contributor to cost reduction since the mean-drift term, which is computationally expensive, is computed for the reduced representation. Overall, a reduction of more than 15% in the computational cost is obtained.
MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce

PubMed Central

2015-01-01

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement. PMID:26305223
MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.

PubMed

Idris, Muhammad; Hussain, Shujaat; Siddiqi, Muhammad Hameed; Hassan, Waseem; Syed Muhammad Bilal, Hafiz; Lee, Sungyoung

2015-01-01

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.
More efficient optimization of long-term water supply portfolios

NASA Astrophysics Data System (ADS)

Kirsch, Brian R.; Characklis, Gregory W.; Dillard, Karen E. M.; Kelley, C. T.

2009-03-01

The use of temporary transfers, such as options and leases, has grown as utilities attempt to meet increases in demand while reducing dependence on the expansion of costly infrastructure capacity (e.g., reservoirs). Earlier work has been done to construct optimal portfolios comprising firm capacity and transfers, using decision rules that determine the timing and volume of transfers. However, such work has only focused on the short-term (e.g., 1-year scenarios), which limits the utility of these planning efforts. Developing multiyear portfolios can lead to the exploration of a wider range of alternatives but also increases the computational burden. This work utilizes a coupled hydrologic-economic model to simulate the long-term performance of a city's water supply portfolio. This stochastic model is linked with an optimization search algorithm that is designed to handle the high-frequency, low-amplitude noise inherent in many simulations, particularly those involving expected values. This noise is detrimental to the accuracy and precision of the optimized solution and has traditionally been controlled by investing greater computational effort in the simulation. However, the increased computational effort can be substantial. This work describes the integration of a variance reduction technique (control variate method) within the simulation/optimization as a means of more efficiently identifying minimum cost portfolios. Random variation in model output (i.e., noise) is moderated using knowledge of random variations in stochastic input variables (e.g., reservoir inflows, demand), thereby reducing the computing time by 50% or more. Using these efficiency gains, water supply portfolios are evaluated over a 10-year period in order to assess their ability to reduce costs and adapt to demand growth, while still meeting reliability goals. As a part of the evaluation, several multiyear option contract structures are explored and compared.

Minimized state complexity of quantum-encoded cryptic processes

NASA Astrophysics Data System (ADS)

Riechers, Paul M.; Mahoney, John R.; Aghamohammadi, Cina; Crutchfield, James P.

2016-05-01

The predictive information required for proper trajectory sampling of a stochastic process can be more efficiently transmitted via a quantum channel than a classical one. This recent discovery allows quantum information processing to drastically reduce the memory necessary to simulate complex classical stochastic processes. It also points to a new perspective on the intrinsic complexity that nature must employ in generating the processes we observe. The quantum advantage increases with codeword length: the length of process sequences used in constructing the quantum communication scheme. In analogy with the classical complexity measure, statistical complexity, we use this reduced communication cost as an entropic measure of state complexity in the quantum representation. Previously difficult to compute, the quantum advantage is expressed here in closed form using spectral decomposition. This allows for efficient numerical computation of the quantum-reduced state complexity at all encoding lengths, including infinite. Additionally, it makes clear how finite-codeword reduction in state complexity is controlled by the classical process's cryptic order, and it allows asymptotic analysis of infinite-cryptic-order processes.
Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning

PubMed Central

Chalmers, Eric; Luczak, Artur; Gruber, Aaron J.

2016-01-01

The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL) to guide “goal-directed” behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals' ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, “forward sweeps” through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort) required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks. PMID:28018203
A space efficient flexible pivot selection approach to evaluate determinant and inverse of a matrix.

PubMed

Jafree, Hafsa Athar; Imtiaz, Muhammad; Inayatullah, Syed; Khan, Fozia Hanif; Nizami, Tajuddin

2014-01-01

This paper presents new simple approaches for evaluating determinant and inverse of a matrix. The choice of pivot selection has been kept arbitrary thus they reduce the error while solving an ill conditioned system. Computation of determinant of a matrix has been made more efficient by saving unnecessary data storage and also by reducing the order of the matrix at each iteration, while dictionary notation [1] has been incorporated for computing the matrix inverse thereby saving unnecessary calculations. These algorithms are highly class room oriented, easy to use and implemented by students. By taking the advantage of flexibility in pivot selection, one may easily avoid development of the fractions by most. Unlike the matrix inversion method [2] and [3], the presented algorithms obviate the use of permutations and inverse permutations.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Progress of High Efficiency Centrifugal Compressor Simulations Using TURBO

NASA Technical Reports Server (NTRS)

Kulkarni, Sameer; Beach, Timothy A.

2017-01-01

Three-dimensional, time-accurate, and phase-lagged computational fluid dynamics (CFD) simulations of the High Efficiency Centrifugal Compressor (HECC) stage were generated using the TURBO solver. Changes to the TURBO Parallel Version 4 source code were made in order to properly model the no-slip boundary condition along the spinning hub region for centrifugal impellers. A startup procedure was developed to generate a converged flow field in TURBO. This procedure initialized computations on a coarsened mesh generated by the Turbomachinery Gridding System (TGS) and relied on a method of systematically increasing wheel speed and backpressure. Baseline design-speed TURBO results generally overpredicted total pressure ratio, adiabatic efficiency, and the choking flow rate of the HECC stage as compared with the design-intent CFD results of Code Leo. Including diffuser fillet geometry in the TURBO computation resulted in a 0.6 percent reduction in the choking flow rate and led to a better match with design-intent CFD. Diffuser fillets reduced annulus cross-sectional area but also reduced corner separation, and thus blockage, in the diffuser passage. It was found that the TURBO computations are somewhat insensitive to inlet total pressure changing from the TURBO default inlet pressure of 14.7 pounds per square inch (101.35 kilopascals) down to 11.0 pounds per square inch (75.83 kilopascals), the inlet pressure of the component test. Off-design tip clearance was modeled in TURBO in two computations: one in which the blade tip geometry was trimmed by 12 mils (0.3048 millimeters), and another in which the hub flow path was moved to reflect a 12-mil axial shift in the impeller hub, creating a step at the hub. The one-dimensional results of these two computations indicate non-negligible differences between the two modeling approaches.
An energy-efficient failure detector for vehicular cloud computing.

PubMed

Liu, Jiaxi; Wu, Zhibo; Dong, Jian; Wu, Jin; Wen, Dongxin

2018-01-01

Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption.
An energy-efficient failure detector for vehicular cloud computing

PubMed Central

Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Wen, Dongxin

2018-01-01

Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption. PMID:29352282
A transfer function type of simplified electrochemical model with modified boundary conditions and Padé approximation for Li-ion battery: Part 1. lithium concentration estimation

NASA Astrophysics Data System (ADS)

Yuan, Shifei; Jiang, Lei; Yin, Chengliang; Wu, Hongjie; Zhang, Xi

2017-06-01

To guarantee the safety, high efficiency and long lifetime for lithium-ion battery, an advanced battery management system requires a physics-meaningful yet computationally efficient battery model. The pseudo-two dimensional (P2D) electrochemical model can provide physical information about the lithium concentration and potential distributions across the cell dimension. However, the extensive computation burden caused by the temporal and spatial discretization limits its real-time application. In this research, we propose a new simplified electrochemical model (SEM) by modifying the boundary conditions for electrolyte diffusion equations, which significantly facilitates the analytical solving process. Then to obtain a reduced order transfer function, the Padé approximation method is adopted to simplify the derived transcendental impedance solution. The proposed model with the reduced order transfer function can be briefly computable and preserve physical meanings through the presence of parameters such as the solid/electrolyte diffusion coefficients (Ds&De) and particle radius. The simulation illustrates that the proposed simplified model maintains high accuracy for electrolyte phase concentration (Ce) predictions, saying 0.8% and 0.24% modeling error respectively, when compared to the rigorous model under 1C-rate pulse charge/discharge and urban dynamometer driving schedule (UDDS) profiles. Meanwhile, this simplified model yields significantly reduced computational burden, which benefits its real-time application.
An Efficient Adaptive Angle-Doppler Compensation Approach for Non-Sidelooking Airborne Radar STAP

PubMed Central

Shen, Mingwei; Yu, Jia; Wu, Di; Zhu, Daiyin

2015-01-01

In this study, the effects of non-sidelooking airborne radar clutter dispersion on space-time adaptive processing (STAP) is considered, and an efficient adaptive angle-Doppler compensation (EAADC) approach is proposed to improve the clutter suppression performance. In order to reduce the computational complexity, the reduced-dimension sparse reconstruction (RDSR) technique is introduced into the angle-Doppler spectrum estimation to extract the required parameters for compensating the clutter spectral center misalignment. Simulation results to demonstrate the effectiveness of the proposed algorithm are presented. PMID:26053755
Parallel, adaptive finite element methods for conservation laws

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.

1994-01-01

We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Model's sparse representation based on reduced mixed GMsFE basis methods

NASA Astrophysics Data System (ADS)

Jiang, Lijian; Li, Qiuqi

2017-06-01

In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.
Model's sparse representation based on reduced mixed GMsFE basis methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn

2017-06-01

In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a largemore » number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.« less
Preserving Lagrangian Structure in Nonlinear Model Reduction with Application to Structural Dynamics

DOE PAGES

Carlberg, Kevin; Tuminaro, Ray; Boggs, Paul

2015-03-11

Our work proposes a model-reduction methodology that preserves Lagrangian structure and achieves computational efficiency in the presence of high-order nonlinearities and arbitrary parameter dependence. As such, the resulting reduced-order model retains key properties such as energy conservation and symplectic time-evolution maps. We focus on parameterized simple mechanical systems subjected to Rayleigh damping and external forces, and consider an application to nonlinear structural dynamics. To preserve structure, the method first approximates the system's “Lagrangian ingredients''---the Riemannian metric, the potential-energy function, the dissipation function, and the external force---and subsequently derives reduced-order equations of motion by applying the (forced) Euler--Lagrange equation with thesemore » quantities. Moreover, from the algebraic perspective, key contributions include two efficient techniques for approximating parameterized reduced matrices while preserving symmetry and positive definiteness: matrix gappy proper orthogonal decomposition and reduced-basis sparsification. Our results for a parameterized truss-structure problem demonstrate the practical importance of preserving Lagrangian structure and illustrate the proposed method's merits: it reduces computation time while maintaining high accuracy and stability, in contrast to existing nonlinear model-reduction techniques that do not preserve structure.« less
Preserving Lagrangian Structure in Nonlinear Model Reduction with Application to Structural Dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carlberg, Kevin; Tuminaro, Ray; Boggs, Paul

Our work proposes a model-reduction methodology that preserves Lagrangian structure and achieves computational efficiency in the presence of high-order nonlinearities and arbitrary parameter dependence. As such, the resulting reduced-order model retains key properties such as energy conservation and symplectic time-evolution maps. We focus on parameterized simple mechanical systems subjected to Rayleigh damping and external forces, and consider an application to nonlinear structural dynamics. To preserve structure, the method first approximates the system's “Lagrangian ingredients''---the Riemannian metric, the potential-energy function, the dissipation function, and the external force---and subsequently derives reduced-order equations of motion by applying the (forced) Euler--Lagrange equation with thesemore » quantities. Moreover, from the algebraic perspective, key contributions include two efficient techniques for approximating parameterized reduced matrices while preserving symmetry and positive definiteness: matrix gappy proper orthogonal decomposition and reduced-basis sparsification. Our results for a parameterized truss-structure problem demonstrate the practical importance of preserving Lagrangian structure and illustrate the proposed method's merits: it reduces computation time while maintaining high accuracy and stability, in contrast to existing nonlinear model-reduction techniques that do not preserve structure.« less
The Slippery Slope of Air Force Downsizing: A Strategy Connection

DTIC Science & Technology

2013-02-14

as the reductions continue. As an example, commanders and their Airmen are responsible for being administration, personnel, finance, communciations ... computer efficient because the expertise in their units or base has been reduced, eliminated or consolidated. This impacts the time and resources...are the areas that require additional research. • Issue: Other functional issues (Logistics, Finance, Contracting, Communications/ computers , Nuclear
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images

PubMed Central

Wang, Yangping; Wang, Song

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU). PMID:28053653
AdapChem

NASA Technical Reports Server (NTRS)

Oluwole, Oluwayemisi O.; Wong, Hsi-Wu; Green, William

2012-01-01

AdapChem software enables high efficiency, low computational cost, and enhanced accuracy on computational fluid dynamics (CFD) numerical simulations used for combustion studies. The software dynamically allocates smaller, reduced chemical models instead of the larger, full chemistry models to evolve the calculation while ensuring the same accuracy to be obtained for steady-state CFD reacting flow simulations. The software enables detailed chemical kinetic modeling in combustion CFD simulations. AdapChem adapts the reaction mechanism used in the CFD to the local reaction conditions. Instead of a single, comprehensive reaction mechanism throughout the computation, a dynamic distribution of smaller, reduced models is used to capture accurately the chemical kinetics at a fraction of the cost of the traditional single-mechanism approach.
Computed Flow Through An Artificial Heart Valve

NASA Technical Reports Server (NTRS)

Rogers, Stewart E.; Kwak, Dochan; Kiris, Cetin; Chang, I-Dee

1994-01-01

Report discusses computations of blood flow through prosthetic tilting disk valve. Computational procedure developed in simulation used to design better artificial hearts and valves by reducing or eliminating following adverse flow characteristics: large pressure losses, which prevent hearts from working efficiently; separated and secondary flows, which causes clotting; and high turbulent shear stresses, which damages red blood cells. Report reiterates and expands upon part of NASA technical memorandum "Computed Flow Through an Artificial Heart and Valve" (ARC-12983). Also based partly on research described in "Numerical Simulation of Flow Through an Artificial Heart" (ARC-12478).
Reducing the computational footprint for real-time BCPNN learning

PubMed Central

Vogginger, Bernhard; Schüffny, René; Lansner, Anders; Cederström, Love; Partzsch, Johannes; Höppner, Sebastian

2015-01-01

The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware. PMID:25657618
Reducing the computational footprint for real-time BCPNN learning.

PubMed

Vogginger, Bernhard; Schüffny, René; Lansner, Anders; Cederström, Love; Partzsch, Johannes; Höppner, Sebastian

2015-01-01

The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware.

Reduced-order modelling of parameter-dependent, linear and nonlinear dynamic partial differential equation models.

PubMed

Shah, A A; Xing, W W; Triantafyllidis, V

2017-04-01

In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach.
Reduced-order modelling of parameter-dependent, linear and nonlinear dynamic partial differential equation models

PubMed Central

Xing, W. W.; Triantafyllidis, V.

2017-01-01

In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach. PMID:28484327
Fast space-varying convolution using matrix source coding with applications to camera stray light reduction.

PubMed

Wei, Jianing; Bouman, Charles A; Allebach, Jan P

2014-05-01

Many imaging applications require the implementation of space-varying convolution for accurate restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear operators whose impulse response has slow spatial variation. In addition, these space-varying convolution operators are often dense, so direct implementation of the convolution operator is typically computationally impractical. One such example is the problem of stray light reduction in digital cameras, which requires the implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as iterative tomographic reconstruction, can also depend on the implementation of dense space-varying convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier transform, this approach does not work for space-varying operators. So direct convolution is often the only option for implementing space-varying convolution. In this paper, we develop a general approach to the efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the memory required to store it; we also dramatically reduce the computation required to implement matrix-vector products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying convolution operator into a product of sparse transforms. Experimental results show that our method can dramatically reduce the computation required for stray light reduction while maintaining high accuracy.
Computation of multi-dimensional viscous supersonic jet flow

NASA Technical Reports Server (NTRS)

Kim, Y. N.; Buggeln, R. C.; Mcdonald, H.

1986-01-01

A new method has been developed for two- and three-dimensional computations of viscous supersonic flows with embedded subsonic regions adjacent to solid boundaries. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases relevant to internal supersonic flow is presented and compared with data. Comparison between data and computation are in general excellent thus indicating that the computational technique has great promise as a tool for calculating supersonic flow with embedded subsonic regions. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Computationally efficient methods for modelling laser wakefield acceleration in the blowout regime

NASA Astrophysics Data System (ADS)

Cowan, B. M.; Kalmykov, S. Y.; Beck, A.; Davoine, X.; Bunkers, K.; Lifschitz, A. F.; Lefebvre, E.; Bruhwiler, D. L.; Shadwick, B. A.; Umstadter, D. P.; Umstadter

2012-08-01

Electron self-injection and acceleration until dephasing in the blowout regime is studied for a set of initial conditions typical of recent experiments with 100-terawatt-class lasers. Two different approaches to computationally efficient, fully explicit, 3D particle-in-cell modelling are examined. First, the Cartesian code vorpal (Nieter, C. and Cary, J. R. 2004 VORPAL: a versatile plasma simulation code. J. Comput. Phys. 196, 538) using a perfect-dispersion electromagnetic solver precisely describes the laser pulse and bubble dynamics, taking advantage of coarser resolution in the propagation direction, with a proportionally larger time step. Using third-order splines for macroparticles helps suppress the sampling noise while keeping the usage of computational resources modest. The second way to reduce the simulation load is using reduced-geometry codes. In our case, the quasi-cylindrical code calder-circ (Lifschitz, A. F. et al. 2009 Particle-in-cell modelling of laser-plasma interaction using Fourier decomposition. J. Comput. Phys. 228(5), 1803-1814) uses decomposition of fields and currents into a set of poloidal modes, while the macroparticles move in the Cartesian 3D space. Cylindrical symmetry of the interaction allows using just two modes, reducing the computational load to roughly that of a planar Cartesian simulation while preserving the 3D nature of the interaction. This significant economy of resources allows using fine resolution in the direction of propagation and a small time step, making numerical dispersion vanishingly small, together with a large number of particles per cell, enabling good particle statistics. Quantitative agreement of two simulations indicates that these are free of numerical artefacts. Both approaches thus retrieve the physically correct evolution of the plasma bubble, recovering the intrinsic connection of electron self-injection to the nonlinear optical evolution of the driver.
Church ownership and hospital efficiency.

PubMed

White, K R; Ozcan, Y A

1996-01-01

Using a sample of California hospitals, the effect of church ownership was examined as it relates to nonprofit hospital efficiency. Efficiency scores were computed using a nonparametric method called data envelopment analysis (DEA). Controlling for hospital size, location, system membership, and type of church ownership, church-owned hospitals were found to be more frequently in the efficient category than their secular nonprofit counterparts. The outcomes have policy implications for reducing healthcare expenditures by focusing on increasing outputs or decreasing inputs, as appropriate, and bolstering the case for church-sponsored hospitals to retain the tax-exempt status due to their ability to manage their resources as efficiently as (or more efficiently than) secular hospitals.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

PubMed Central

Chen, Jui-Le; Yang, Chu-Sing

2013-01-01

The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Investigations of fluid-strain interaction using Plate Boundary Observatory borehole data

NASA Astrophysics Data System (ADS)

Boyd, Jeffrey Michael

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy a system consumes the more accurate will be. We explore how finding the transitions between human activities is able to reduce the energy consumption of such systems without reducing much accuracy. We introduce the Log-likelihood Ratio Test as a method to detect transitions, and explore how choices of sensor, feature calculations, and parameters concerning time segmentation affect the accuracy of this method. We discovered an approximate 5X increase in energy efficiency could be achieved with only a 5% decrease in accuracy. We also address how a system's sleep mode, in which the processor enters a low-power state and sensors are turned off, affects a wearable computing platform that does activity recognition. We discuss the energy trade-offs in each stage of the activity recognition process. We find that careful analysis of these parameters can result in great increases in energy efficiency if small compromises in overall accuracy can be tolerated. We call this the ``Great Compromise.'' We found a 6X increase in efficiency with a 7% decrease in accuracy. We then consider how wireless transmission of data affects the overall energy efficiency of a wearable computing platform. We find that design decisions such as feature calculations and grouping size have a great impact on the energy consumption of the system because of the amount of data that is stored and transmitted. For example, storing and transmitting vector-based features such as FFT or DCT do not compress the signal and would use more energy than storing and transmitting the raw signal. The effect of grouping size on energy consumption depends on the feature. For scalar features energy consumption is proportional in the inverse of grouping size, so it's reduced as grouping size goes up. For features that depend on the grouping size, such as FFT, energy increases with the logarithm of grouping size, so energy consumption increases slowly as grouping size increases. We find that compressing data through activity classification and transition detection significantly reduces energy consumption and that the energy consumed for the classification overhead is negligible compared to the energy savings from data compression. We provide mathematical models of energy usage and data generation, and test our ideas using a mobile computing platform, the Texas Instruments Chronos watch.
Towards feasible and effective predictive wavefront control for adaptive optics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poyneer, L A; Veran, J

We have recently proposed Predictive Fourier Control, a computationally efficient and adaptive algorithm for predictive wavefront control that assumes frozen flow turbulence. We summarize refinements to the state-space model that allow operation with arbitrary computational delays and reduce the computational cost of solving for new control. We present initial atmospheric characterization using observations with Gemini North's Altair AO system. These observations, taken over 1 year, indicate that frozen flow is exists, contains substantial power, and is strongly detected 94% of the time.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach

NASA Technical Reports Server (NTRS)

Mak, Victor W. K.

1986-01-01

Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
Improve the efficiency of the Cartesian tensor based fast multipole method for Coulomb interaction using the traces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, He; Luo, Li -Shi; Li, Rui

To compute the non-oscillating mutual interaction for a systems with N points, the fast multipole method (FMM) has an efficiency that scales linearly with the number of points. Specifically, for Coulomb interaction, FMM can be constructed using either the spherical harmonic functions or the totally symmetric Cartesian tensors. In this paper, we will present that the effciency of the Cartesian tensor-based FMM for the Coulomb interaction can be significantly improved by implementing the traces of the Cartesian tensors in calculation to reduce the independent elements of the n-th rank totally symmetric Cartesian tensor from (n + 1)(n + 2)=2 tomore » 2n + 1. The computation complexity for the operations in FMM are analyzed and expressed as polynomials of the highest rank of the Cartesian tensors. For most operations, the complexity is reduced by one order. Numerical examples regarding the convergence and the effciency of the new algorithm are demonstrated. As a result, a reduction of computation time up to 50% has been observed for a moderate number of points and rank of tensors.« less
Improve the efficiency of the Cartesian tensor based fast multipole method for Coulomb interaction using the traces

DOE PAGES

Huang, He; Luo, Li -Shi; Li, Rui; ...

2018-05-17

To compute the non-oscillating mutual interaction for a systems with N points, the fast multipole method (FMM) has an efficiency that scales linearly with the number of points. Specifically, for Coulomb interaction, FMM can be constructed using either the spherical harmonic functions or the totally symmetric Cartesian tensors. In this paper, we will present that the effciency of the Cartesian tensor-based FMM for the Coulomb interaction can be significantly improved by implementing the traces of the Cartesian tensors in calculation to reduce the independent elements of the n-th rank totally symmetric Cartesian tensor from (n + 1)(n + 2)=2 tomore » 2n + 1. The computation complexity for the operations in FMM are analyzed and expressed as polynomials of the highest rank of the Cartesian tensors. For most operations, the complexity is reduced by one order. Numerical examples regarding the convergence and the effciency of the new algorithm are demonstrated. As a result, a reduction of computation time up to 50% has been observed for a moderate number of points and rank of tensors.« less
Two-Dimensional Electronic Spectroscopy of Benzene, Phenol, and Their Dimer: An Efficient First-Principles Simulation Protocol.

PubMed

Nenov, Artur; Mukamel, Shaul; Garavelli, Marco; Rivalta, Ivan

2015-08-11

First-principles simulations of two-dimensional electronic spectroscopy in the ultraviolet region (2DUV) require computationally demanding multiconfigurational approaches that can resolve doubly excited and charge transfer states, the spectroscopic fingerprints of coupled UV-active chromophores. Here, we propose an efficient approach to reduce the computational cost of accurate simulations of 2DUV spectra of benzene, phenol, and their dimer (i.e., the minimal models for studying electronic coupling of UV-chromophores in proteins). We first establish the multiconfigurational recipe with the highest accuracy by comparison with experimental data, providing reference gas-phase transition energies and dipole moments that can be used to construct exciton Hamiltonians involving high-lying excited states. We show that by reducing the active spaces and the number of configuration state functions within restricted active space schemes, the computational cost can be significantly decreased without loss of accuracy in predicting 2DUV spectra. The proposed recipe has been successfully tested on a realistic model proteic system in water. Accounting for line broadening due to thermal and solvent-induced fluctuations allows for direct comparison with experiments.
A 3D staggered-grid finite difference scheme for poroelastic wave equation

NASA Astrophysics Data System (ADS)

Zhang, Yijie; Gao, Jinghuai

2014-10-01

Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Determination of aerodynamic sensitivity coefficients for wings in transonic flow

NASA Technical Reports Server (NTRS)

Carlson, Leland A.; El-Banna, Hesham M.

1992-01-01

The quasianalytical approach is applied to the 3-D full potential equation to compute wing aerodynamic sensitivity coefficients in the transonic regime. Symbolic manipulation is used to reduce the effort associated with obtaining the sensitivity equations, and the large sensitivity system is solved using 'state of the art' routines. The quasianalytical approach is believed to be reasonably accurate and computationally efficient for 3-D problems.
Reliable Early Classification on Multivariate Time Series with Numerical and Categorical Attributes

DTIC Science & Technology

2015-05-22

design a procedure of feature extraction in REACT named MEG (Mining Equivalence classes with shapelet Generators) based on the concept of...Equivalence Classes Mining [12, 15]. MEG can efficiently and effectively generate the discriminative features. In addition, several strategies are proposed...technique of parallel computing [4] to propose a process of pa- rallel MEG for substantially reducing the computational overhead of discovering shapelet
Efficient 3D geometric and Zernike moments computation from unstructured surface meshes.

PubMed

Pozo, José María; Villa-Uriol, Maria-Cruz; Frangi, Alejandro F

2011-03-01

This paper introduces and evaluates a fast exact algorithm and a series of faster approximate algorithms for the computation of 3D geometric moments from an unstructured surface mesh of triangles. Being based on the object surface reduces the computational complexity of these algorithms with respect to volumetric grid-based algorithms. In contrast, it can only be applied for the computation of geometric moments of homogeneous objects. This advantage and restriction is shared with other proposed algorithms based on the object boundary. The proposed exact algorithm reduces the computational complexity for computing geometric moments up to order N with respect to previously proposed exact algorithms, from N(9) to N(6). The approximate series algorithm appears as a power series on the rate between triangle size and object size, which can be truncated at any desired degree. The higher the number and quality of the triangles, the better the approximation. This approximate algorithm reduces the computational complexity to N(3). In addition, the paper introduces a fast algorithm for the computation of 3D Zernike moments from the computed geometric moments, with a computational complexity N(4), while the previously proposed algorithm is of order N(6). The error introduced by the proposed approximate algorithms is evaluated in different shapes and the cost-benefit ratio in terms of error, and computational time is analyzed for different moment orders.
Reducing Bottlenecks to Improve the Efficiency of the Lung Cancer Care Delivery Process: A Process Engineering Modeling Approach to Patient-Centered Care.

PubMed

Ju, Feng; Lee, Hyo Kyung; Yu, Xinhua; Faris, Nicholas R; Rugless, Fedoria; Jiang, Shan; Li, Jingshan; Osarogiagbon, Raymond U

2017-12-01

The process of lung cancer care from initial lesion detection to treatment is complex, involving multiple steps, each introducing the potential for substantial delays. Identifying the steps with the greatest delays enables a focused effort to improve the timeliness of care-delivery, without sacrificing quality. We retrospectively reviewed clinical events from initial detection, through histologic diagnosis, radiologic and invasive staging, and medical clearance, to surgery for all patients who had an attempted resection of a suspected lung cancer in a community healthcare system. We used a computer process modeling approach to evaluate delays in care delivery, in order to identify potential 'bottlenecks' in waiting time, the reduction of which could produce greater care efficiency. We also conducted 'what-if' analyses to predict the relative impact of simulated changes in the care delivery process to determine the most efficient pathways to surgery. The waiting time between radiologic lesion detection and diagnostic biopsy, and the waiting time from radiologic staging to surgery were the two most critical bottlenecks impeding efficient care delivery (more than 3 times larger compared to reducing other waiting times). Additionally, instituting surgical consultation prior to cardiac consultation for medical clearance and decreasing the waiting time between CT scans and diagnostic biopsies, were potentially the most impactful measures to reduce care delays before surgery. Rigorous computer simulation modeling, using clinical data, can provide useful information to identify areas for improving the efficiency of care delivery by process engineering, for patients who receive surgery for lung cancer.
Efficient LIDAR Point Cloud Data Managing and Processing in a Hadoop-Based Distributed Framework

NASA Astrophysics Data System (ADS)

Wang, C.; Hu, F.; Sha, D.; Han, X.

2017-10-01

Light Detection and Ranging (LiDAR) is one of the most promising technologies in surveying and mapping city management, forestry, object recognition, computer vision engineer and others. However, it is challenging to efficiently storage, query and analyze the high-resolution 3D LiDAR data due to its volume and complexity. In order to improve the productivity of Lidar data processing, this study proposes a Hadoop-based framework to efficiently manage and process LiDAR data in a distributed and parallel manner, which takes advantage of Hadoop's storage and computing ability. At the same time, the Point Cloud Library (PCL), an open-source project for 2D/3D image and point cloud processing, is integrated with HDFS and MapReduce to conduct the Lidar data analysis algorithms provided by PCL in a parallel fashion. The experiment results show that the proposed framework can efficiently manage and process big LiDAR data.
BESIU Physical Analysis on Hadoop Platform

NASA Astrophysics Data System (ADS)

Huo, Jing; Zang, Dongsong; Lei, Xiaofeng; Li, Qiang; Sun, Gongxing

2014-06-01

In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure.

SAGE: The Self-Adaptive Grid Code. 3

NASA Technical Reports Server (NTRS)

Davies, Carol B.; Venkatapathy, Ethiraj

1999-01-01

The multi-dimensional self-adaptive grid code, SAGE, is an important tool in the field of computational fluid dynamics (CFD). It provides an efficient method to improve the accuracy of flow solutions while simultaneously reducing computer processing time. Briefly, SAGE enhances an initial computational grid by redistributing the mesh points into more appropriate locations. The movement of these points is driven by an equal-error-distribution algorithm that utilizes the relationship between high flow gradients and excessive solution errors. The method also provides a balance between clustering points in the high gradient regions and maintaining the smoothness and continuity of the adapted grid, The latest version, Version 3, includes the ability to change the boundaries of a given grid to more efficiently enclose flow structures and provides alternative redistribution algorithms.
Symmetry-conserving purification of quantum states within the density matrix renormalization group

DOE PAGES

Nocera, Alberto; Alvarez, Gonzalo

2016-01-28

The density matrix renormalization group (DMRG) algorithm was originally designed to efficiently compute the zero-temperature or ground-state properties of one-dimensional strongly correlated quantum systems. The development of the algorithm at finite temperature has been a topic of much interest, because of the usefulness of thermodynamics quantities in understanding the physics of condensed matter systems, and because of the increased complexity associated with efficiently computing temperature-dependent properties. The ancilla method is a DMRG technique that enables the computation of these thermodynamic quantities. In this paper, we review the ancilla method, and improve its performance by working on reduced Hilbert spaces andmore » using canonical approaches. Furthermore we explore its applicability beyond spins systems to t-J and Hubbard models.« less
Structural, thermodynamic, and electrical properties of polar fluids and ionic solutions on a hypersphere: Results of simulations

NASA Astrophysics Data System (ADS)

Caillol, J. M.; Levesque, D.

1992-01-01

The reliability and the efficiency of a new method suitable for the simulations of dielectric fluids and ionic solutions is established by numerical computations. The efficiency depends on the use of a simulation cell which is the surface of a four-dimensional sphere. The reliability originates from a charge-charge potential solution of the Poisson equation in this confining volume. The computation time, for systems of a few hundred molecules, is reduced by a factor of 2 or 3 compared to this of a simulation performed in a cubic volume with periodic boundary conditions and the Ewald charge-charge potential.
An analysis of thermal response factors and how to reduce their computational time requirement

NASA Technical Reports Server (NTRS)

Wiese, M. R.

1982-01-01

Te RESFAC2 version of the Thermal Response Factor Program (RESFAC) is the result of numerous modifications and additions to the original RESFAC. These modifications and additions have significantly reduced the program's computational time requirement. As a result of this work, the program is more efficient and its code is both readable and understandable. This report describes what a thermal response factor is; analyzes the original matrix algebra calculations and root finding techniques; presents a new root finding technique and streamlined matrix algebra; supplies ten validation cases and their results.
Parallelization of Nullspace Algorithm for the computation of metabolic pathways

PubMed Central

Jevremović, Dimitrije; Trinh, Cong T.; Srienc, Friedrich; Sosa, Carlos P.; Boley, Daniel

2011-01-01

Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100–1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency. PMID:22058581
Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

PubMed Central

Saa, Pedro A.; Nielsen, Lars K.

2016-01-01

Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2013-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved 'clock time' speedups in fusing datasets on our own compute nodes and in the public Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed 'near' the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill. SciReduce Architecture
Efficient and Extensible Quasi-Explicit Modular Nonlinear Multiscale Battery Model: GH-MSMD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Gi-Heon; Smith, Kandler; Lawrence-Simon, Jake

Complex physics and long computation time hinder the adoption of computer aided engineering models in the design of large-format battery cells and systems. A modular, efficient battery simulation model -- the multiscale multidomain (MSMD) model -- was previously introduced to aid the scale-up of Li-ion material and electrode designs to complete cell and pack designs, capturing electrochemical interplay with 3-D electronic current pathways and thermal response. Here, this paper enhances the computational efficiency of the MSMD model using a separation of time-scales principle to decompose model field variables. The decomposition provides a quasi-explicit linkage between the multiple length-scale domains andmore » thus reduces time-consuming nested iteration when solving model equations across multiple domains. In addition to particle-, electrode- and cell-length scales treated in the previous work, the present formulation extends to bus bar- and multi-cell module-length scales. We provide example simulations for several variants of GH electrode-domain models.« less
Efficient and Extensible Quasi-Explicit Modular Nonlinear Multiscale Battery Model: GH-MSMD

DOE PAGES

Kim, Gi-Heon; Smith, Kandler; Lawrence-Simon, Jake; ...

2017-03-24

Complex physics and long computation time hinder the adoption of computer aided engineering models in the design of large-format battery cells and systems. A modular, efficient battery simulation model -- the multiscale multidomain (MSMD) model -- was previously introduced to aid the scale-up of Li-ion material and electrode designs to complete cell and pack designs, capturing electrochemical interplay with 3-D electronic current pathways and thermal response. Here, this paper enhances the computational efficiency of the MSMD model using a separation of time-scales principle to decompose model field variables. The decomposition provides a quasi-explicit linkage between the multiple length-scale domains andmore » thus reduces time-consuming nested iteration when solving model equations across multiple domains. In addition to particle-, electrode- and cell-length scales treated in the previous work, the present formulation extends to bus bar- and multi-cell module-length scales. We provide example simulations for several variants of GH electrode-domain models.« less
Increasing the sampling efficiency of protein conformational transition using velocity-scaling optimized hybrid explicit/implicit solvent REMD simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Yuqi; Wang, Jinan; Shao, Qiang, E-mail: qshao@mail.shcnc.ac.cn, E-mail: Jiye.Shi@ucb.com, E-mail: wlzhu@mail.shcnc.ac.cn

2015-03-28

The application of temperature replica exchange molecular dynamics (REMD) simulation on protein motion is limited by its huge requirement of computational resource, particularly when explicit solvent model is implemented. In the previous study, we developed a velocity-scaling optimized hybrid explicit/implicit solvent REMD method with the hope to reduce the temperature (replica) number on the premise of maintaining high sampling efficiency. In this study, we utilized this method to characterize and energetically identify the conformational transition pathway of a protein model, the N-terminal domain of calmodulin. In comparison to the standard explicit solvent REMD simulation, the hybrid REMD is much lessmore » computationally expensive but, meanwhile, gives accurate evaluation of the structural and thermodynamic properties of the conformational transition which are in well agreement with the standard REMD simulation. Therefore, the hybrid REMD could highly increase the computational efficiency and thus expand the application of REMD simulation to larger-size protein systems.« less
Vibration extraction based on fast NCC algorithm and high-speed camera.

PubMed

Lei, Xiujun; Jin, Yi; Guo, Jie; Zhu, Chang'an

2015-09-20

In this study, a high-speed camera system is developed to complete the vibration measurement in real time and to overcome the mass introduced by conventional contact measurements. The proposed system consists of a notebook computer and a high-speed camera which can capture the images as many as 1000 frames per second. In order to process the captured images in the computer, the normalized cross-correlation (NCC) template tracking algorithm with subpixel accuracy is introduced. Additionally, a modified local search algorithm based on the NCC is proposed to reduce the computation time and to increase efficiency significantly. The modified algorithm can rapidly accomplish one displacement extraction 10 times faster than the traditional template matching without installing any target panel onto the structures. Two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the system performance in practice. The results demonstrated the high accuracy and efficiency of the camera system in extracting vibrating signals.
Parallel processing optimization strategy based on MapReduce model in cloud storage environment

NASA Astrophysics Data System (ADS)

Cui, Jianming; Liu, Jiayi; Li, Qiuyan

2017-05-01

Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.
Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis

NASA Technical Reports Server (NTRS)

Duffy, Daniel Q.; Schnase, John L.; Thompson, John H.; Freeman, Shawn M.; Clune, Thomas L.

2012-01-01

MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows.
Improving multi-objective reservoir operation optimization with sensitivity-informed dimension reduction

NASA Astrophysics Data System (ADS)

Chu, J.; Zhang, C.; Fu, G.; Li, Y.; Zhou, H.

2015-08-01

This study investigates the effectiveness of a sensitivity-informed method for multi-objective operation of reservoir systems, which uses global sensitivity analysis as a screening tool to reduce computational demands. Sobol's method is used to screen insensitive decision variables and guide the formulation of the optimization problems with a significantly reduced number of decision variables. This sensitivity-informed method dramatically reduces the computational demands required for attaining high-quality approximations of optimal trade-off relationships between conflicting design objectives. The search results obtained from the reduced complexity multi-objective reservoir operation problems are then used to pre-condition the full search of the original optimization problem. In two case studies, the Dahuofang reservoir and the inter-basin multi-reservoir system in Liaoning province, China, sensitivity analysis results show that reservoir performance is strongly controlled by a small proportion of decision variables. Sensitivity-informed dimension reduction and pre-conditioning are evaluated in their ability to improve the efficiency and effectiveness of multi-objective evolutionary optimization. Overall, this study illustrates the efficiency and effectiveness of the sensitivity-informed method and the use of global sensitivity analysis to inform dimension reduction of optimization problems when solving complex multi-objective reservoir operation problems.
Improving multi-objective reservoir operation optimization with sensitivity-informed problem decomposition

NASA Astrophysics Data System (ADS)

Chu, J. G.; Zhang, C.; Fu, G. T.; Li, Y.; Zhou, H. C.

2015-04-01

This study investigates the effectiveness of a sensitivity-informed method for multi-objective operation of reservoir systems, which uses global sensitivity analysis as a screening tool to reduce the computational demands. Sobol's method is used to screen insensitive decision variables and guide the formulation of the optimization problems with a significantly reduced number of decision variables. This sensitivity-informed problem decomposition dramatically reduces the computational demands required for attaining high quality approximations of optimal tradeoff relationships between conflicting design objectives. The search results obtained from the reduced complexity multi-objective reservoir operation problems are then used to pre-condition the full search of the original optimization problem. In two case studies, the Dahuofang reservoir and the inter-basin multi-reservoir system in Liaoning province, China, sensitivity analysis results show that reservoir performance is strongly controlled by a small proportion of decision variables. Sensitivity-informed problem decomposition and pre-conditioning are evaluated in their ability to improve the efficiency and effectiveness of multi-objective evolutionary optimization. Overall, this study illustrates the efficiency and effectiveness of the sensitivity-informed method and the use of global sensitivity analysis to inform problem decomposition when solving the complex multi-objective reservoir operation problems.
Computation of multi-dimensional viscous supersonic flow

NASA Technical Reports Server (NTRS)

Buggeln, R. C.; Kim, Y. N.; Mcdonald, H.

1986-01-01

A method has been developed for two- and three-dimensional computations of viscous supersonic jet flows interacting with an external flow. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases associated with supersonic jet flow is presented and compared with other calculations for axisymmetric cases. Demonstration calculations indicate that the computational technique has great promise as a tool for calculating a wide range of supersonic flow problems including jet flow. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Stochastic Control of Energy Efficient Buildings: A Semidefinite Programming Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Xiao; Dong, Jin; Djouadi, Seddik M

2015-01-01

The key goal in energy efficient buildings is to reduce energy consumption of Heating, Ventilation, and Air- Conditioning (HVAC) systems while maintaining a comfortable temperature and humidity in the building. This paper proposes a novel stochastic control approach for achieving joint performance and power control of HVAC. We employ a constrained Stochastic Linear Quadratic Control (cSLQC) by minimizing a quadratic cost function with a disturbance assumed to be Gaussian. The problem is formulated to minimize the expected cost subject to a linear constraint and a probabilistic constraint. By using cSLQC, the problem is reduced to a semidefinite optimization problem, wheremore » the optimal control can be computed efficiently by Semidefinite programming (SDP). Simulation results are provided to demonstrate the effectiveness and power efficiency by utilizing the proposed control approach.« less
Modified complementary ensemble empirical mode decomposition and intrinsic mode functions evaluation index for high-speed train gearbox fault diagnosis

NASA Astrophysics Data System (ADS)

Chen, Dongyue; Lin, Jianhui; Li, Yanping

2018-06-01

Complementary ensemble empirical mode decomposition (CEEMD) has been developed for the mode-mixing problem in Empirical Mode Decomposition (EMD) method. Compared to the ensemble empirical mode decomposition (EEMD), the CEEMD method reduces residue noise in the signal reconstruction. Both CEEMD and EEMD need enough ensemble number to reduce the residue noise, and hence it would be too much computation cost. Moreover, the selection of intrinsic mode functions (IMFs) for further analysis usually depends on experience. A modified CEEMD method and IMFs evaluation index are proposed with the aim of reducing the computational cost and select IMFs automatically. A simulated signal and in-service high-speed train gearbox vibration signals are employed to validate the proposed method in this paper. The results demonstrate that the modified CEEMD can decompose the signal efficiently with less computation cost, and the IMFs evaluation index can select the meaningful IMFs automatically.
Wall Shear Stress Distribution in a Patient-Specific Cerebral Aneurysm Model using Reduced Order Modeling

NASA Astrophysics Data System (ADS)

Han, Suyue; Chang, Gary Han; Schirmer, Clemens; Modarres-Sadeghi, Yahya

2016-11-01

We construct a reduced-order model (ROM) to study the Wall Shear Stress (WSS) distributions in image-based patient-specific aneurysms models. The magnitude of WSS has been shown to be a critical factor in growth and rupture of human aneurysms. We start the process by running a training case using Computational Fluid Dynamics (CFD) simulation with time-varying flow parameters, such that these parameters cover the range of parameters of interest. The method of snapshot Proper Orthogonal Decomposition (POD) is utilized to construct the reduced-order bases using the training CFD simulation. The resulting ROM enables us to study the flow patterns and the WSS distributions over a range of system parameters computationally very efficiently with a relatively small number of modes. This enables comprehensive analysis of the model system across a range of physiological conditions without the need to re-compute the simulation for small changes in the system parameters.
Solving large-scale PDE-constrained Bayesian inverse problems with Riemann manifold Hamiltonian Monte Carlo

NASA Astrophysics Data System (ADS)

Bui-Thanh, T.; Girolami, M.

2014-11-01

We consider the Riemann manifold Hamiltonian Monte Carlo (RMHMC) method for solving statistical inverse problems governed by partial differential equations (PDEs). The Bayesian framework is employed to cast the inverse problem into the task of statistical inference whose solution is the posterior distribution in infinite dimensional parameter space conditional upon observation data and Gaussian prior measure. We discretize both the likelihood and the prior using the H1-conforming finite element method together with a matrix transfer technique. The power of the RMHMC method is that it exploits the geometric structure induced by the PDE constraints of the underlying inverse problem. Consequently, each RMHMC posterior sample is almost uncorrelated/independent from the others providing statistically efficient Markov chain simulation. However this statistical efficiency comes at a computational cost. This motivates us to consider computationally more efficient strategies for RMHMC. At the heart of our construction is the fact that for Gaussian error structures the Fisher information matrix coincides with the Gauss-Newton Hessian. We exploit this fact in considering a computationally simplified RMHMC method combining state-of-the-art adjoint techniques and the superiority of the RMHMC method. Specifically, we first form the Gauss-Newton Hessian at the maximum a posteriori point and then use it as a fixed constant metric tensor throughout RMHMC simulation. This eliminates the need for the computationally costly differential geometric Christoffel symbols, which in turn greatly reduces computational effort at a corresponding loss of sampling efficiency. We further reduce the cost of forming the Fisher information matrix by using a low rank approximation via a randomized singular value decomposition technique. This is efficient since a small number of Hessian-vector products are required. The Hessian-vector product in turn requires only two extra PDE solves using the adjoint technique. Various numerical results up to 1025 parameters are presented to demonstrate the ability of the RMHMC method in exploring the geometric structure of the problem to propose (almost) uncorrelated/independent samples that are far away from each other, and yet the acceptance rate is almost unity. The results also suggest that for the PDE models considered the proposed fixed metric RMHMC can attain almost as high a quality performance as the original RMHMC, i.e. generating (almost) uncorrelated/independent samples, while being two orders of magnitude less computationally expensive.

A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
Efficient Computation Of Behavior Of Aircraft Tires

NASA Technical Reports Server (NTRS)

Tanner, John A.; Noor, Ahmed K.; Andersen, Carl M.

1989-01-01

NASA technical paper discusses challenging application of computational structural mechanics to numerical simulation of responses of aircraft tires during taxing, takeoff, and landing. Presents details of three main elements of computational strategy: use of special three-field, mixed-finite-element models; use of operator splitting; and application of technique reducing substantially number of degrees of freedom. Proposed computational strategy applied to two quasi-symmetric problems: linear analysis of anisotropic tires through use of two-dimensional-shell finite elements and nonlinear analysis of orthotropic tires subjected to unsymmetric loading. Three basic types of symmetry and combinations exhibited by response of tire identified.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, R.E.; Gustafson, J.L.; Montry, G.R.

1999-08-10

A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1999-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
An algorithm for automatic reduction of complex signal flow graphs

NASA Technical Reports Server (NTRS)

Young, K. R.; Hoberock, L. L.; Thompson, J. G.

1976-01-01

A computer algorithm is developed that provides efficient means to compute transmittances directly from a signal flow graph or a block diagram. Signal flow graphs are cast as directed graphs described by adjacency matrices. Nonsearch computation, designed for compilers without symbolic capability, is used to identify all arcs that are members of simple cycles for use with Mason's gain formula. The routine does not require the visual acumen of an interpreter to reduce the topology of the graph, and it is particularly useful for analyzing control systems described for computer analyses by means of interactive graphics.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
On some methods for improving time of reachability sets computation for the dynamic system control problem

NASA Astrophysics Data System (ADS)

Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir

2016-12-01

The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
Exploiting the chaotic behaviour of atmospheric models with reconfigurable architectures

NASA Astrophysics Data System (ADS)

Russell, Francis P.; Düben, Peter D.; Niu, Xinyu; Luk, Wayne; Palmer, T. N.

2017-12-01

Reconfigurable architectures are becoming mainstream: Amazon, Microsoft and IBM are supporting such architectures in their data centres. The computationally intensive nature of atmospheric modelling is an attractive target for hardware acceleration using reconfigurable computing. Performance of hardware designs can be improved through the use of reduced-precision arithmetic, but maintaining appropriate accuracy is essential. We explore reduced-precision optimisation for simulating chaotic systems, targeting atmospheric modelling, in which even minor changes in arithmetic behaviour will cause simulations to diverge quickly. The possibility of equally valid simulations having differing outcomes means that standard techniques for comparing numerical accuracy are inappropriate. We use the Hellinger distance to compare statistical behaviour between reduced-precision CPU implementations to guide reconfigurable designs of a chaotic system, then analyse accuracy, performance and power efficiency of the resulting implementations. Our results show that with only a limited loss in accuracy corresponding to less than 10% uncertainty in input parameters, the throughput and energy efficiency of a single-precision chaotic system implemented on a Xilinx Virtex-6 SX475T Field Programmable Gate Array (FPGA) can be more than doubled.
The use of methods of structural optimization at the stage of designing high-rise buildings with steel construction

NASA Astrophysics Data System (ADS)

Vasilkin, Andrey

2018-03-01

The more designing solutions at the search stage for design for high-rise buildings can be synthesized by the engineer, the more likely that the final adopted version will be the most efficient and economical. However, in modern market conditions, taking into account the complexity and responsibility of high-rise buildings the designer does not have the necessary time to develop, analyze and compare any significant number of options. To solve this problem, it is expedient to use the high potential of computer-aided designing. To implement automated search for design solutions, it is proposed to develop the computing facilities, the application of which will significantly increase the productivity of the designer and reduce the complexity of designing. Methods of structural and parametric optimization have been adopted as the basis of the computing facilities. Their efficiency in the synthesis of design solutions is shown, also the schemes, that illustrate and explain the introduction of structural optimization in the traditional design of steel frames, are constructed. To solve the problem of synthesis and comparison of design solutions for steel frames, it is proposed to develop the computing facilities that significantly reduces the complexity of search designing and based on the use of methods of structural and parametric optimization.
A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling

NASA Astrophysics Data System (ADS)

Mo, Shaoxing; Lu, Dan; Shi, Xiaoqing; Zhang, Guannan; Ye, Ming; Wu, Jianfeng; Wu, Jichun

2017-12-01

Global sensitivity analysis (GSA) and uncertainty quantification (UQ) for groundwater modeling are challenging because of the model complexity and significant computational requirements. To reduce the massive computational cost, a cheap-to-evaluate surrogate model is usually constructed to approximate and replace the expensive groundwater models in the GSA and UQ. Constructing an accurate surrogate requires actual model simulations on a number of parameter samples. Thus, a robust experimental design strategy is desired to locate informative samples so as to reduce the computational cost in surrogate construction and consequently to improve the efficiency in the GSA and UQ. In this study, we develop a Taylor expansion-based adaptive design (TEAD) that aims to build an accurate global surrogate model with a small training sample size. TEAD defines a novel hybrid score function to search informative samples, and a robust stopping criterion to terminate the sample search that guarantees the resulted approximation errors satisfy the desired accuracy. The good performance of TEAD in building global surrogate models is demonstrated in seven analytical functions with different dimensionality and complexity in comparison to two widely used experimental design methods. The application of the TEAD-based surrogate method in two groundwater models shows that the TEAD design can effectively improve the computational efficiency of GSA and UQ for groundwater modeling.
Selection of bi-level image compression method for reduction of communication energy in wireless visual sensor networks

NASA Astrophysics Data System (ADS)

Khursheed, Khursheed; Imran, Muhammad; Ahmad, Naeem; O'Nils, Mattias

2012-06-01

Wireless Visual Sensor Network (WVSN) is an emerging field which combines image sensor, on board computation unit, communication component and energy source. Compared to the traditional wireless sensor network, which operates on one dimensional data, such as temperature, pressure values etc., WVSN operates on two dimensional data (images) which requires higher processing power and communication bandwidth. Normally, WVSNs are deployed in areas where installation of wired solutions is not feasible. The energy budget in these networks is limited to the batteries, because of the wireless nature of the application. Due to the limited availability of energy, the processing at Visual Sensor Nodes (VSN) and communication from VSN to server should consume as low energy as possible. Transmission of raw images wirelessly consumes a lot of energy and requires higher communication bandwidth. Data compression methods reduce data efficiently and hence will be effective in reducing communication cost in WVSN. In this paper, we have compared the compression efficiency and complexity of six well known bi-level image compression methods. The focus is to determine the compression algorithms which can efficiently compress bi-level images and their computational complexity is suitable for computational platform used in WVSNs. These results can be used as a road map for selection of compression methods for different sets of constraints in WVSN.
Analytic energy gradients for orbital-optimized MP3 and MP2.5 with the density-fitting approximation: An efficient implementation.

PubMed

Bozkaya, Uğur

2018-03-15

Efficient implementations of analytic gradients for the orbital-optimized MP3 and MP2.5 and their standard versions with the density-fitting approximation, which are denoted as DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5, are presented. The DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5 methods are applied to a set of alkanes and noncovalent interaction complexes to compare the computational cost with the conventional MP3, MP2.5, OMP3, and OMP2.5. Our results demonstrate that density-fitted perturbation theory (DF-MP) methods considered substantially reduce the computational cost compared to conventional MP methods. The efficiency of our DF-MP methods arise from the reduced input/output (I/O) time and the acceleration of gradient related terms, such as computations of particle density and generalized Fock matrices (PDMs and GFM), solution of the Z-vector equation, back-transformations of PDMs and GFM, and evaluation of analytic gradients in the atomic orbital basis. Further, application results show that errors introduced by the DF approach are negligible. Mean absolute errors for bond lengths of a molecular set, with the cc-pCVQZ basis set, is 0.0001-0.0002 Å. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Seismic imaging using finite-differences and parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ober, C.C.

1997-12-31

A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less
Advanced On-Board Processor (AOP). [for future spacecraft applications

NASA Technical Reports Server (NTRS)

1973-01-01

Advanced On-board Processor the (AOP) uses large scale integration throughout and is the most advanced space qualified computer of its class in existence today. It was designed to satisfy most spacecraft requirements which are anticipated over the next several years. The AOP design utilizes custom metallized multigate arrays (CMMA) which have been designed specifically for this computer. This approach provides the most efficient use of circuits, reduces volume, weight, assembly costs and provides for a significant increase in reliability by the significant reduction in conventional circuit interconnections. The required 69 CMMA packages are assembled on a single multilayer printed circuit board which together with associated connectors constitutes the complete AOP. This approach also reduces conventional interconnections thus further reducing weight, volume and assembly costs.
Fault Injection Campaign for a Fault Tolerant Duplex Framework

NASA Technical Reports Server (NTRS)

Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

2007-01-01

Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
On Improving Efficiency of Differential Evolution for Aerodynamic Shape Optimization Applications

NASA Technical Reports Server (NTRS)

Madavan, Nateri K.

2004-01-01

Differential Evolution (DE) is a simple and robust evolutionary strategy that has been proven effective in determining the global optimum for several difficult optimization problems. Although DE offers several advantages over traditional optimization approaches, its use in applications such as aerodynamic shape optimization where the objective function evaluations are computationally expensive is limited by the large number of function evaluations often required. In this paper various approaches for improving the efficiency of DE are reviewed and discussed. These approaches are implemented in a DE-based aerodynamic shape optimization method that uses a Navier-Stokes solver for the objective function evaluations. Parallelization techniques on distributed computers are used to reduce turnaround times. Results are presented for the inverse design of a turbine airfoil. The efficiency improvements achieved by the different approaches are evaluated and compared.
A Secure and Efficient Audit Mechanism for Dynamic Shared Data in Cloud Storage

PubMed Central

2014-01-01

With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data. PMID:24959630
Efficient Ab initio Modeling of Random Multicomponent Alloys

DOE PAGES

Jiang, Chao; Uberuaga, Blas P.

2016-03-08

Here, we present in this Letter a novel small set of ordered structures (SSOS) method that allows extremely efficient ab initio modeling of random multi-component alloys. Using inverse II-III spinel oxides and equiatomic quinary bcc (so-called high entropy) alloys as examples, we also demonstrate that a SSOS can achieve the same accuracy as a large supercell or a well-converged cluster expansion, but with significantly reduced computational cost. In particular, because of this efficiency, a large number of quinary alloy compositions can be quickly screened, leading to the identification of several new possible high entropy alloy chemistries. Furthermore, the SSOS methodmore » developed here can be broadly useful for the rapid computational design of multi-component materials, especially those with a large number of alloying elements, a challenging problem for other approaches.« less
A secure and efficient audit mechanism for dynamic shared data in cloud storage.

PubMed

Kwon, Ohmin; Koo, Dongyoung; Shin, Yongjoo; Yoon, Hyunsoo

2014-01-01

With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data.

Fast approximation for joint optimization of segmentation, shape, and location priors, and its application in gallbladder segmentation.

PubMed

Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu

2017-05-01

This paper addresses joint optimization for segmentation and shape priors, including translation, to overcome inter-subject variability in the location of an organ. Because a simple extension of the previous exact optimization method is too computationally complex, we propose a fast approximation for optimization. The effectiveness of the proposed approximation is validated in the context of gallbladder segmentation from a non-contrast computed tomography (CT) volume. After spatial standardization and estimation of the posterior probability of the target organ, simultaneous optimization of the segmentation, shape, and location priors is performed using a branch-and-bound method. Fast approximation is achieved by combining sampling in the eigenshape space to reduce the number of shape priors and an efficient computational technique for evaluating the lower bound. Performance was evaluated using threefold cross-validation of 27 CT volumes. Optimization in terms of translation of the shape prior significantly improved segmentation performance. The proposed method achieved a result of 0.623 on the Jaccard index in gallbladder segmentation, which is comparable to that of state-of-the-art methods. The computational efficiency of the algorithm is confirmed to be good enough to allow execution on a personal computer. Joint optimization of the segmentation, shape, and location priors was proposed, and it proved to be effective in gallbladder segmentation with high computational efficiency.
Optical testing of aspheres based on photochromic computer-generated holograms

NASA Astrophysics Data System (ADS)

Pariani, Giorgio; Bianco, Andrea; Bertarelli, Chiara; Spanó, Paolo; Molinari, Emilio

2010-07-01

Aspherical optics are widely used in modern optical telescopes and instrumentation because of their ability to reduce aberrations with a simple optical system. Testing their optical quality through null interferometry is not trivial as reference optics are not available. Computer-Generated Holograms (CGHs) are efficient devices that allow to generate a well-defined optical wavefront. We developed rewritable Computer Generated Holograms for the interferometric test of aspheres based on photochromic layers. These photochromic holograms are cost-effective and the method of production does not need any post exposure process.
A systematic and efficient method to compute multi-loop master integrals

NASA Astrophysics Data System (ADS)

Liu, Xiao; Ma, Yan-Qing; Wang, Chen-Yu

2018-04-01

We propose a novel method to compute multi-loop master integrals by constructing and numerically solving a system of ordinary differential equations, with almost trivial boundary conditions. Thus it can be systematically applied to problems with arbitrary kinematic configurations. Numerical tests show that our method can not only achieve results with high precision, but also be much faster than the only existing systematic method sector decomposition. As a by product, we find a new strategy to compute scalar one-loop integrals without reducing them to master integrals.
A synchronized computational architecture for generalized bilateral control of robot arms

NASA Technical Reports Server (NTRS)

Bejczy, Antal K.; Szakaly, Zoltan

1987-01-01

This paper describes a computational architecture for an interconnected high speed distributed computing system for generalized bilateral control of robot arms. The key method of the architecture is the use of fully synchronized, interrupt driven software. Since an objective of the development is to utilize the processing resources efficiently, the synchronization is done in the hardware level to reduce system software overhead. The architecture also achieves a balaced load on the communication channel. The paper also describes some architectural relations to trading or sharing manual and automatic control.
Using 3D infrared imaging to calibrate and refine computational fluid dynamic modeling for large computer and data centers

NASA Astrophysics Data System (ADS)

Stockton, Gregory R.

2011-05-01

Over the last 10 years, very large government, military, and commercial computer and data center operators have spent millions of dollars trying to optimally cool data centers as each rack has begun to consume as much as 10 times more power than just a few years ago. In fact, the maximum amount of data computation in a computer center is becoming limited by the amount of available power, space and cooling capacity at some data centers. Tens of millions of dollars and megawatts of power are being annually spent to keep data centers cool. The cooling and air flows dynamically change away from any predicted 3-D computational fluid dynamic modeling during construction and as time goes by, and the efficiency and effectiveness of the actual cooling rapidly departs even farther from predicted models. By using 3-D infrared (IR) thermal mapping and other techniques to calibrate and refine the computational fluid dynamic modeling and make appropriate corrections and repairs, the required power for data centers can be dramatically reduced which reduces costs and also improves reliability.
Computationally efficient algorithm for high sampling-frequency operation of active noise control

NASA Astrophysics Data System (ADS)

Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati

2015-05-01

In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
Scalable parallel distance field construction for large-scale applications

DOE PAGES

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...

2015-10-01

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less
Scalable Parallel Distance Field Construction for Large-Scale Applications.

PubMed

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H

2015-10-01

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
SmartVeh: Secure and Efficient Message Access Control and Authentication for Vehicular Cloud Computing.

PubMed

Huang, Qinlong; Yang, Yixian; Shi, Yuxiang

2018-02-24

With the growing number of vehicles and popularity of various services in vehicular cloud computing (VCC), message exchanging among vehicles under traffic conditions and in emergency situations is one of the most pressing demands, and has attracted significant attention. However, it is an important challenge to authenticate the legitimate sources of broadcast messages and achieve fine-grained message access control. In this work, we propose SmartVeh, a secure and efficient message access control and authentication scheme in VCC. A hierarchical, attribute-based encryption technique is utilized to achieve fine-grained and flexible message sharing, which ensures that vehicles whose persistent or dynamic attributes satisfy the access policies can access the broadcast message with equipped on-board units (OBUs). Message authentication is enforced by integrating an attribute-based signature, which achieves message authentication and maintains the anonymity of the vehicles. In order to reduce the computations of the OBUs in the vehicles, we outsource the heavy computations of encryption, decryption and signing to a cloud server and road-side units. The theoretical analysis and simulation results reveal that our secure and efficient scheme is suitable for VCC.
SmartVeh: Secure and Efficient Message Access Control and Authentication for Vehicular Cloud Computing

PubMed Central

Yang, Yixian; Shi, Yuxiang

2018-01-01

With the growing number of vehicles and popularity of various services in vehicular cloud computing (VCC), message exchanging among vehicles under traffic conditions and in emergency situations is one of the most pressing demands, and has attracted significant attention. However, it is an important challenge to authenticate the legitimate sources of broadcast messages and achieve fine-grained message access control. In this work, we propose SmartVeh, a secure and efficient message access control and authentication scheme in VCC. A hierarchical, attribute-based encryption technique is utilized to achieve fine-grained and flexible message sharing, which ensures that vehicles whose persistent or dynamic attributes satisfy the access policies can access the broadcast message with equipped on-board units (OBUs). Message authentication is enforced by integrating an attribute-based signature, which achieves message authentication and maintains the anonymity of the vehicles. In order to reduce the computations of the OBUs in the vehicles, we outsource the heavy computations of encryption, decryption and signing to a cloud server and road-side units. The theoretical analysis and simulation results reveal that our secure and efficient scheme is suitable for VCC. PMID:29495269
An efficient quantum circuit analyser on qubits and qudits

NASA Astrophysics Data System (ADS)

Loke, T.; Wang, J. B.

2011-10-01

This paper presents a highly efficient decomposition scheme and its associated Mathematica notebook for the analysis of complicated quantum circuits comprised of single/multiple qubit and qudit quantum gates. In particular, this scheme reduces the evaluation of multiple unitary gate operations with many conditionals to just two matrix additions, regardless of the number of conditionals or gate dimensions. This improves significantly the capability of a quantum circuit analyser implemented in a classical computer. This is also the first efficient quantum circuit analyser to include qudit quantum logic gates.
The multifacet graphically contracted function method. I. Formulation and implementation

NASA Astrophysics Data System (ADS)

Shepard, Ron; Gidofalvi, Gergely; Brozell, Scott R.

2014-08-01

The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that both the energy and the gradient computation scale as O(N2n4) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N2 dissociation, cubic H8 dissociation, the symmetric dissociation of H2O, and the insertion of Be into H2. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.
The multifacet graphically contracted function method. I. Formulation and implementation.

PubMed

Shepard, Ron; Gidofalvi, Gergely; Brozell, Scott R

2014-08-14

The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that both the energy and the gradient computation scale as O(N(2)n(4)) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N2 dissociation, cubic H8 dissociation, the symmetric dissociation of H2O, and the insertion of Be into H2. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.
Computationally Efficient Adaptive Beamformer for Ultrasound Imaging Based on QR Decomposition.

PubMed

Park, Jongin; Wi, Seok-Min; Lee, Jin S

2016-02-01

Adaptive beamforming methods for ultrasound imaging have been studied to improve image resolution and contrast. The most common approach is the minimum variance (MV) beamformer which minimizes the power of the beamformed output while maintaining the response from the direction of interest constant. The method achieves higher resolution and better contrast than the delay-and-sum (DAS) beamformer, but it suffers from high computational cost. This cost is mainly due to the computation of the spatial covariance matrix and its inverse, which requires O(L(3)) computations, where L denotes the subarray size. In this study, we propose a computationally efficient MV beamformer based on QR decomposition. The idea behind our approach is to transform the spatial covariance matrix to be a scalar matrix σI and we subsequently obtain the apodization weights and the beamformed output without computing the matrix inverse. To do that, QR decomposition algorithm is used and also can be executed at low cost, and therefore, the computational complexity is reduced to O(L(2)). In addition, our approach is mathematically equivalent to the conventional MV beamformer, thereby showing the equivalent performances. The simulation and experimental results support the validity of our approach.
Static Memory Deduplication for Performance Optimization in Cloud Computing.

PubMed

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-04-27

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing

PubMed Central

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-01-01

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Low-power hardware implementation of movement decoding for brain computer interface with reduced-resolution discrete cosine transform.

PubMed

Minho Won; Albalawi, Hassan; Xin Li; Thomas, Donald E

2014-01-01

This paper describes a low-power hardware implementation for movement decoding of brain computer interface. Our proposed hardware design is facilitated by two novel ideas: (i) an efficient feature extraction method based on reduced-resolution discrete cosine transform (DCT), and (ii) a new hardware architecture of dual look-up table to perform discrete cosine transform without explicit multiplication. The proposed hardware implementation has been validated for movement decoding of electrocorticography (ECoG) signal by using a Xilinx FPGA Zynq-7000 board. It achieves more than 56× energy reduction over a reference design using band-pass filters for feature extraction.
Improving Computational Efficiency of Prediction in Model-based Prognostics Using the Unscented Transform

DTIC Science & Technology

2010-10-01

bodies becomes greater as surface as- perities wear down (Hutchings, 1992). We characterize friction damage by a change in the friction coefficient...points are such a set, and satisfy an additional constraint in which the skew ( third moment) is minimized, which reduces the average error for a...On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10, 197–208. Hutchings, I. M. (1992). Tribology : friction
Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

DTIC Science & Technology

1988-01-21

p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.

PubMed

Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui

2017-01-08

Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.

Efficient multiscale magnetic-domain analysis of iron-core material under mechanical stress

NASA Astrophysics Data System (ADS)

Nishikubo, Atsushi; Ito, Shumpei; Mifune, Takeshi; Matsuo, Tetsuji; Kaido, Chikara; Takahashi, Yasuhito; Fujiwara, Koji

2018-05-01

For an efficient analysis of magnetization, a partial-implicit solution method is improved using an assembled domain structure model with six-domain mesoscopic particles exhibiting pinning-type hysteresis. The quantitative analysis of non-oriented silicon steel succeeds in predicting the stress dependence of hysteresis loss with computation times greatly reduced by using the improved partial-implicit method. The effect of cell division along the thickness direction is also evaluated.
Low rank approach to computing first and higher order derivatives using automatic differentiation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reed, J. A.; Abdel-Khalik, H. S.; Utke, J.

2012-07-01

This manuscript outlines a new approach for increasing the efficiency of applying automatic differentiation (AD) to large scale computational models. By using the principles of the Efficient Subspace Method (ESM), low rank approximations of the derivatives for first and higher orders can be calculated using minimized computational resources. The output obtained from nuclear reactor calculations typically has a much smaller numerical rank compared to the number of inputs and outputs. This rank deficiency can be exploited to reduce the number of derivatives that need to be calculated using AD. The effective rank can be determined according to ESM by computingmore » derivatives with AD at random inputs. Reduced or pseudo variables are then defined and new derivatives are calculated with respect to the pseudo variables. Two different AD packages are used: OpenAD and Rapsodia. OpenAD is used to determine the effective rank and the subspace that contains the derivatives. Rapsodia is then used to calculate derivatives with respect to the pseudo variables for the desired order. The overall approach is applied to two simple problems and to MATWS, a safety code for sodium cooled reactors. (authors)« less
Addressing Curse of Dimensionality in Sensitivity Analysis: How Can We Handle High-Dimensional Problems?

NASA Astrophysics Data System (ADS)

Safaei, S.; Haghnegahdar, A.; Razavi, S.

2016-12-01

Complex environmental models are now the primary tool to inform decision makers for the current or future management of environmental resources under the climate and environmental changes. These complex models often contain a large number of parameters that need to be determined by a computationally intensive calibration procedure. Sensitivity analysis (SA) is a very useful tool that not only allows for understanding the model behavior, but also helps in reducing the number of calibration parameters by identifying unimportant ones. The issue is that most global sensitivity techniques are highly computationally demanding themselves for generating robust and stable sensitivity metrics over the entire model response surface. Recently, a novel global sensitivity analysis method, Variogram Analysis of Response Surfaces (VARS), is introduced that can efficiently provide a comprehensive assessment of global sensitivity using the Variogram concept. In this work, we aim to evaluate the effectiveness of this highly efficient GSA method in saving computational burden, when applied to systems with extra-large number of input factors ( 100). We use a test function and a hydrological modelling case study to demonstrate the capability of VARS method in reducing problem dimensionality by identifying important vs unimportant input factors.
Development of an Efficient Binaural Simulation for the Analysis of Structural Acoustic Data

NASA Technical Reports Server (NTRS)

Lalime, Aimee L.; Johnson, Marty E.; Rizzi, Stephen A. (Technical Monitor)

2002-01-01

Binaural or "virtual acoustic" representation has been proposed as a method of analyzing acoustic and vibroacoustic data. Unfortunately, this binaural representation can require extensive computer power to apply the Head Related Transfer Functions (HRTFs) to a large number of sources, as with a vibrating structure. This work focuses on reducing the number of real-time computations required in this binaural analysis through the use of Singular Value Decomposition (SVD) and Equivalent Source Reduction (ESR). The SVD method reduces the complexity of the HRTF computations by breaking the HRTFs into dominant singular values (and vectors). The ESR method reduces the number of sources to be analyzed in real-time computation by replacing sources on the scale of a structural wavelength with sources on the scale of an acoustic wavelength. It is shown that the effectiveness of the SVD and ESR methods improves as the complexity of the source increases. In addition, preliminary auralization tests have shown that the results from both the SVD and ESR methods are indistinguishable from the results found with the exhaustive method.
Energy Use and Power Levels in New Monitors and Personal Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roberson, Judy A.; Homan, Gregory K.; Mahajan, Akshay

2002-07-23

Our research was conducted in support of the EPA ENERGY STAR Office Equipment program, whose goal is to reduce the amount of electricity consumed by office equipment in the U.S. The most energy-efficient models in each office equipment category are eligible for the ENERGY STAR label, which consumers can use to identify and select efficient products. As the efficiency of each category improves over time, the ENERGY STAR criteria need to be revised accordingly. The purpose of this study was to provide reliable data on the energy consumption of the newest personal computers and monitors that the EPA can usemore » to evaluate revisions to current ENERGY STAR criteria as well as to improve the accuracy of ENERGY STAR program savings estimates. We report the results of measuring the power consumption and power management capabilities of a sample of new monitors and computers. These results will be used to improve estimates of program energy savings and carbon emission reductions, and to inform rev isions of the ENERGY STAR criteria for these products. Our sample consists of 35 monitors and 26 computers manufactured between July 2000 and October 2001; it includes cathode ray tube (CRT) and liquid crystal display (LCD) monitors, Macintosh and Intel-architecture computers, desktop and laptop computers, and integrated computer systems, in which power consumption of the computer and monitor cannot be measured separately. For each machine we measured power consumption when off, on, and in each low-power level. We identify trends in and opportunities to reduce power consumption in new personal computers and monitors. Our results include a trend among monitor manufacturers to provide a single very low low-power level, well below the current ENERGY STAR criteria for sleep power consumption. These very low sleep power results mean that energy consumed when monitors are off or in active use has become more important in terms of contribution to the overall unit energy consumption (UEC). Cur rent ENERGY STAR monitor and computer criteria do not specify off or on power, but our results suggest opportunities for saving energy in these modes. Also, significant differences between CRT and LCD technology, and between field-measured and manufacturer-reported power levels reveal the need for standard methods and metrics for measuring and comparing monitor power consumption.« less
Efficient Multiple Kernel Learning Algorithms Using Low-Rank Representation.

PubMed

Niu, Wenjia; Xia, Kewen; Zu, Baokai; Bai, Jianchuan

2017-01-01

Unlike Support Vector Machine (SVM), Multiple Kernel Learning (MKL) allows datasets to be free to choose the useful kernels based on their distribution characteristics rather than a precise one. It has been shown in the literature that MKL holds superior recognition accuracy compared with SVM, however, at the expense of time consuming computations. This creates analytical and computational difficulties in solving MKL algorithms. To overcome this issue, we first develop a novel kernel approximation approach for MKL and then propose an efficient Low-Rank MKL (LR-MKL) algorithm by using the Low-Rank Representation (LRR). It is well-acknowledged that LRR can reduce dimension while retaining the data features under a global low-rank constraint. Furthermore, we redesign the binary-class MKL as the multiclass MKL based on pairwise strategy. Finally, the recognition effect and efficiency of LR-MKL are verified on the datasets Yale, ORL, LSVT, and Digit. Experimental results show that the proposed LR-MKL algorithm is an efficient kernel weights allocation method in MKL and boosts the performance of MKL largely.
Reduced-rank approximations to the far-field transform in the gridded fast multipole method

NASA Astrophysics Data System (ADS)

Hesford, Andrew J.; Waag, Robert C.

2011-05-01

The fast multipole method (FMM) has been shown to have a reduced computational dependence on the size of finest-level groups of elements when the elements are positioned on a regular grid and FFT convolution is used to represent neighboring interactions. However, transformations between plane-wave expansions used for FMM interactions and pressure distributions used for neighboring interactions remain significant contributors to the cost of FMM computations when finest-level groups are large. The transformation operators, which are forward and inverse Fourier transforms with the wave space confined to the unit sphere, are smooth and well approximated using reduced-rank decompositions that further reduce the computational dependence of the FMM on finest-level group size. The adaptive cross approximation (ACA) is selected to represent the forward and adjoint far-field transformation operators required by the FMM. However, the actual error of the ACA is found to be greater than that predicted using traditional estimates, and the ACA generally performs worse than the approximation resulting from a truncated singular-value decomposition (SVD). To overcome these issues while avoiding the cost of a full-scale SVD, the ACA is employed with more stringent accuracy demands and recompressed using a reduced, truncated SVD. The results show a greatly reduced approximation error that performs comparably to the full-scale truncated SVD without degrading the asymptotic computational efficiency associated with ACA matrix assembly.
Reduced-Rank Approximations to the Far-Field Transform in the Gridded Fast Multipole Method.

PubMed

Hesford, Andrew J; Waag, Robert C

2011-05-10

The fast multipole method (FMM) has been shown to have a reduced computational dependence on the size of finest-level groups of elements when the elements are positioned on a regular grid and FFT convolution is used to represent neighboring interactions. However, transformations between plane-wave expansions used for FMM interactions and pressure distributions used for neighboring interactions remain significant contributors to the cost of FMM computations when finest-level groups are large. The transformation operators, which are forward and inverse Fourier transforms with the wave space confined to the unit sphere, are smooth and well approximated using reduced-rank decompositions that further reduce the computational dependence of the FMM on finest-level group size. The adaptive cross approximation (ACA) is selected to represent the forward and adjoint far-field transformation operators required by the FMM. However, the actual error of the ACA is found to be greater than that predicted using traditional estimates, and the ACA generally performs worse than the approximation resulting from a truncated singular-value decomposition (SVD). To overcome these issues while avoiding the cost of a full-scale SVD, the ACA is employed with more stringent accuracy demands and recompressed using a reduced, truncated SVD. The results show a greatly reduced approximation error that performs comparably to the full-scale truncated SVD without degrading the asymptotic computational efficiency associated with ACA matrix assembly.
Reduced-Rank Approximations to the Far-Field Transform in the Gridded Fast Multipole Method

PubMed Central

Hesford, Andrew J.; Waag, Robert C.

2011-01-01

The fast multipole method (FMM) has been shown to have a reduced computational dependence on the size of finest-level groups of elements when the elements are positioned on a regular grid and FFT convolution is used to represent neighboring interactions. However, transformations between plane-wave expansions used for FMM interactions and pressure distributions used for neighboring interactions remain significant contributors to the cost of FMM computations when finest-level groups are large. The transformation operators, which are forward and inverse Fourier transforms with the wave space confined to the unit sphere, are smooth and well approximated using reduced-rank decompositions that further reduce the computational dependence of the FMM on finest-level group size. The adaptive cross approximation (ACA) is selected to represent the forward and adjoint far-field transformation operators required by the FMM. However, the actual error of the ACA is found to be greater than that predicted using traditional estimates, and the ACA generally performs worse than the approximation resulting from a truncated singular-value decomposition (SVD). To overcome these issues while avoiding the cost of a full-scale SVD, the ACA is employed with more stringent accuracy demands and recompressed using a reduced, truncated SVD. The results show a greatly reduced approximation error that performs comparably to the full-scale truncated SVD without degrading the asymptotic computational efficiency associated with ACA matrix assembly. PMID:21552350
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

NASA Astrophysics Data System (ADS)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
User participation in the development of the human/computer interface for control centers

NASA Technical Reports Server (NTRS)

Broome, Richard; Quick-Campbell, Marlene; Creegan, James; Dutilly, Robert

1996-01-01

Technological advances coupled with the requirements to reduce operations staffing costs led to the demand for efficient, technologically-sophisticated mission operations control centers. The control center under development for the earth observing system (EOS) is considered. The users are involved in the development of a control center in order to ensure that it is cost-efficient and flexible. A number of measures were implemented in the EOS program in order to encourage user involvement in the area of human-computer interface development. The following user participation exercises carried out in relation to the system analysis and design are described: the shadow participation of the programmers during a day of operations; the flight operations personnel interviews; and the analysis of the flight operations team tasks. The user participation in the interface prototype development, the prototype evaluation, and the system implementation are reported on. The involvement of the users early in the development process enables the requirements to be better understood and the cost to be reduced.
Attitude Determination Algorithm based on Relative Quaternion Geometry of Velocity Incremental Vectors for Cost Efficient AHRS Design

NASA Astrophysics Data System (ADS)

Lee, Byungjin; Lee, Young Jae; Sung, Sangkyung

2018-05-01

A novel attitude determination method is investigated that is computationally efficient and implementable in low cost sensor and embedded platform. Recent result on attitude reference system design is adapted to further develop a three-dimensional attitude determination algorithm through the relative velocity incremental measurements. For this, velocity incremental vectors, computed respectively from INS and GPS with different update rate, are compared to generate filter measurement for attitude estimation. In the quaternion-based Kalman filter configuration, an Euler-like attitude perturbation angle is uniquely introduced for reducing filter states and simplifying propagation processes. Furthermore, assuming a small angle approximation between attitude update periods, it is shown that the reduced order filter greatly simplifies the propagation processes. For performance verification, both simulation and experimental studies are completed. A low cost MEMS IMU and GPS receiver are employed for system integration, and comparison with the true trajectory or a high-grade navigation system demonstrates the performance of the proposed algorithm.
Dynamic VMs placement for energy efficiency by PSO in cloud computing

NASA Astrophysics Data System (ADS)

Dashti, Seyed Ebrahim; Rahmani, Amir Masoud

2016-03-01

Recently, cloud computing is growing fast and helps to realise other high technologies. In this paper, we propose a hieratical architecture to satisfy both providers' and consumers' requirements in these technologies. We design a new service in the PaaS layer for scheduling consumer tasks. In the providers' perspective, incompatibility between specification of physical machine and user requests in cloud leads to problems such as energy-performance trade-off and large power consumption so that profits are decreased. To guarantee Quality of service of users' tasks, and reduce energy efficiency, we proposed to modify Particle Swarm Optimisation to reallocate migrated virtual machines in the overloaded host. We also dynamically consolidate the under-loaded host which provides power saving. Simulation results in CloudSim demonstrated that whatever simulation condition is near to the real environment, our method is able to save as much as 14% more energy and the number of migrations and simulation time significantly reduces compared with the previous works.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

NASA Technical Reports Server (NTRS)

Povitsky, A.

1998-01-01

In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
Predicting Flows of Rarefied Gases

NASA Technical Reports Server (NTRS)

LeBeau, Gerald J.; Wilmoth, Richard G.

2005-01-01

DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tan, Li; Chen, Zizhong; Song, Shuaiwen

2016-01-18

Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tan, Li; Song, Shuaiwen; Wu, Panruo

2015-05-29

Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tan, Li; Chen, Zizhong; Song, Shuaiwen Leon

2015-11-16

Energy efficiency and resilience are two crucial challenges for HPC systems to reach exascale. While energy efficiency and resilience issues have been extensively studied individually, little has been done to understand the interplay between energy efficiency and resilience for HPC systems. Decreasing the supply voltage associated with a given operating frequency for processors and other CMOS-based components can significantly reduce power consumption. However, this often raises system failure rates and consequently increases application execution time. In this work, we present an energy saving undervolting approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by undervolting.
Three-dimensional inverse modelling of damped elastic wave propagation in the Fourier domain

NASA Astrophysics Data System (ADS)

Petrov, Petr V.; Newman, Gregory A.

2014-09-01

3-D full waveform inversion (FWI) of seismic wavefields is routinely implemented with explicit time-stepping simulators. A clear advantage of explicit time stepping is the avoidance of solving large-scale implicit linear systems that arise with frequency domain formulations. However, FWI using explicit time stepping may require a very fine time step and (as a consequence) significant computational resources and run times. If the computational challenges of wavefield simulation can be effectively handled, an FWI scheme implemented within the frequency domain utilizing only a few frequencies, offers a cost effective alternative to FWI in the time domain. We have therefore implemented a 3-D FWI scheme for elastic wave propagation in the Fourier domain. To overcome the computational bottleneck in wavefield simulation, we have exploited an efficient Krylov iterative solver for the elastic wave equations approximated with second and fourth order finite differences. The solver does not exploit multilevel preconditioning for wavefield simulation, but is coupled efficiently to the inversion iteration workflow to reduce computational cost. The workflow is best described as a series of sequential inversion experiments, where in the case of seismic reflection acquisition geometries, the data has been laddered such that we first image highly damped data, followed by data where damping is systemically reduced. The key to our modelling approach is its ability to take advantage of solver efficiency when the elastic wavefields are damped. As the inversion experiment progresses, damping is significantly reduced, effectively simulating non-damped wavefields in the Fourier domain. While the cost of the forward simulation increases as damping is reduced, this is counterbalanced by the cost of the outer inversion iteration, which is reduced because of a better starting model obtained from the larger damped wavefield used in the previous inversion experiment. For cross-well data, it is also possible to launch a successful inversion experiment without laddering the damping constants. With this type of acquisition geometry, the solver is still quite effective using a small fixed damping constant. To avoid cycle skipping, we also employ a multiscale imaging approach, in which frequency content of the data is also laddered (with the data now including both reflection and cross-well data acquisition geometries). Thus the inversion process is launched using low frequency data to first recover the long spatial wavelength of the image. With this image as a new starting model, adding higher frequency data refines and enhances the resolution of the image. FWI using laddered frequencies with an efficient damping schemed enables reconstructing elastic attributes of the subsurface at a resolution that approaches half the smallest wavelength utilized to image the subsurface. We show the possibility of effectively carrying out such reconstructions using two to six frequencies, depending upon the application. Using the proposed FWI scheme, massively parallel computing resources are essential for reasonable execution times.
An index-based algorithm for fast on-line query processing of latent semantic analysis

PubMed Central

Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

An index-based algorithm for fast on-line query processing of latent semantic analysis.

PubMed

Zhang, Mingxi; Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Development of Reduced-Order Models for Aeroelastic and Flutter Prediction Using the CFL3Dv6.0 Code

NASA Technical Reports Server (NTRS)

Silva, Walter A.; Bartels, Robert E.

2002-01-01

A reduced-order model (ROM) is developed for aeroelastic analysis using the CFL3D version 6.0 computational fluid dynamics (CFD) code, recently developed at the NASA Langley Research Center. This latest version of the flow solver includes a deforming mesh capability, a modal structural definition for nonlinear aeroelastic analyses, and a parallelization capability that provides a significant increase in computational efficiency. Flutter results for the AGARD 445.6 Wing computed using CFL3D v6.0 are presented, including discussion of associated computational costs. Modal impulse responses of the unsteady aerodynamic system are then computed using the CFL3Dv6 code and transformed into state-space form. Important numerical issues associated with the computation of the impulse responses are presented. The unsteady aerodynamic state-space ROM is then combined with a state-space model of the structure to create an aeroelastic simulation using the MATLAB/SIMULINK environment. The MATLAB/SIMULINK ROM is used to rapidly compute aeroelastic transients including flutter. The ROM shows excellent agreement with the aeroelastic analyses computed using the CFL3Dv6.0 code directly.
The future challenge for aeropropulsion

NASA Technical Reports Server (NTRS)

Rosen, Robert; Bowditch, David N.

1992-01-01

NASA's research in aeropropulsion is focused on improving the efficiency, capability, and environmental compatibility for all classes of future aircraft. The development of innovative concepts, and theoretical, experimental, and computational tools provide the knowledge base for continued propulsion system advances. Key enabling technologies include advances in internal fluid mechanics, structures, light-weight high-strength composite materials, and advanced sensors and controls. Recent emphasis has been on the development of advanced computational tools in internal fluid mechanics, structural mechanics, reacting flows, and computational chemistry. For subsonic transport applications, very high bypass ratio turbofans with increased engine pressure ratio are being investigated to increase fuel efficiency and reduce airport noise levels. In a joint supersonic cruise propulsion program with industry, the critical environmental concerns of emissions and community noise are being addressed. NASA is also providing key technologies for the National Aerospaceplane, and is studying propulsion systems that provide the capability for aircraft to accelerate to and cruise in the Mach 4-6 speed range. The combination of fundamental, component, and focused technology development underway at NASA will make possible dramatic advances in aeropropulsion efficiency and environmental compatibility for future aeronautical vehicles.
Discovering Synergistic Drug Combination from a Computational Perspective.

PubMed

Ding, Pingjian; Luo, Jiawei; Liang, Cheng; Xiao, Qiu; Cao, Buwen; Li, Guanghui

2018-03-30

Synergistic drug combinations play an important role in the treatment of complex diseases. The identification of effective drug combination is vital to further reduce the side effects and improve therapeutic efficiency. In previous years, in vitro method has been the main route to discover synergistic drug combinations. However, many limitations of time and resource consumption lie within the in vitro method. Therefore, with the rapid development of computational models and the explosive growth of large and phenotypic data, computational methods for discovering synergistic drug combinations are an efficient and promising tool and contribute to precision medicine. It is the key of computational methods how to construct the computational model. Different computational strategies generate different performance. In this review, the recent advancements in computational methods for predicting effective drug combination are concluded from multiple aspects. First, various datasets utilized to discover synergistic drug combinations are summarized. Second, we discussed feature-based approaches and partitioned these methods into two classes including feature-based methods in terms of similarity measure, and feature-based methods in terms of machine learning. Third, we discussed network-based approaches for uncovering synergistic drug combinations. Finally, we analyzed and prospected computational methods for predicting effective drug combinations. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Bessel function expansion to reduce the calculation time and memory usage for cylindrical computer-generated holograms.

PubMed

Sando, Yusuke; Barada, Daisuke; Jackin, Boaz Jessie; Yatagai, Toyohiko

2017-07-10

This study proposes a method to reduce the calculation time and memory usage required for calculating cylindrical computer-generated holograms. The wavefront on the cylindrical observation surface is represented as a convolution integral in the 3D Fourier domain. The Fourier transformation of the kernel function involving this convolution integral is analytically performed using a Bessel function expansion. The analytical solution can drastically reduce the calculation time and the memory usage without any cost, compared with the numerical method using fast Fourier transform to Fourier transform the kernel function. In this study, we present the analytical derivation, the efficient calculation of Bessel function series, and a numerical simulation. Furthermore, we demonstrate the effectiveness of the analytical solution through comparisons of calculation time and memory usage.
Efficient Conservative Reformulation Schemes for Lithium Intercalation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Urisanga, PC; Rife, D; De, S

Porous electrode theory coupled with transport and reaction mechanisms is a widely used technique to model Li-ion batteries employing an appropriate discretization or approximation for solid phase diffusion with electrode particles. One of the major difficulties in simulating Li-ion battery models is the need to account for solid phase diffusion in a second radial dimension r, which increases the computation time/cost to a great extent. Various methods that reduce the computational cost have been introduced to treat this phenomenon, but most of them do not guarantee mass conservation. The aim of this paper is to introduce an inherently mass conservingmore » yet computationally efficient method for solid phase diffusion based on Lobatto III A quadrature. This paper also presents coupling of the new solid phase reformulation scheme with a macro-homogeneous porous electrode theory based pseudo 20 model for Li-ion battery. (C) The Author(s) 2015. Published by ECS. All rights reserved.« less
Parallelization of the FLAPW method

NASA Astrophysics Data System (ADS)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Cut set-based risk and reliability analysis for arbitrarily interconnected networks

DOEpatents

Wyss, Gregory D.

2000-01-01

Method for computing all-terminal reliability for arbitrarily interconnected networks such as the United States public switched telephone network. The method includes an efficient search algorithm to generate minimal cut sets for nonhierarchical networks directly from the network connectivity diagram. Efficiency of the search algorithm stems in part from its basis on only link failures. The method also includes a novel quantification scheme that likewise reduces computational effort associated with assessing network reliability based on traditional risk importance measures. Vast reductions in computational effort are realized since combinatorial expansion and subsequent Boolean reduction steps are eliminated through analysis of network segmentations using a technique of assuming node failures to occur on only one side of a break in the network, and repeating the technique for all minimal cut sets generated with the search algorithm. The method functions equally well for planar and non-planar networks.
On Improving Efficiency of Differential Evolution for Aerodynamic Shape Optimization Applications

NASA Technical Reports Server (NTRS)

Madavan, Nateri K.

2004-01-01

Differential Evolution (DE) is a simple and robust evolutionary strategy that has been provEn effective in determining the global optimum for several difficult optimization problems. Although DE offers several advantages over traditional optimization approaches, its use in applications such as aerodynamic shape optimization where the objective function evaluations are computationally expensive is limited by the large number of function evaluations often required. In this paper various approaches for improving the efficiency of DE are reviewed and discussed. Several approaches that have proven effective for other evolutionary algorithms are modified and implemented in a DE-based aerodynamic shape optimization method that uses a Navier-Stokes solver for the objective function evaluations. Parallelization techniques on distributed computers are used to reduce turnaround times. Results are presented for standard test optimization problems and for the inverse design of a turbine airfoil. The efficiency improvements achieved by the different approaches are evaluated and compared.
Cross-indexing of binary SIFT codes for large-scale image search.

PubMed

Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi

2014-05-01

In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
Accelerated simulation of stochastic particle removal processes in particle-resolved aerosol models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, J.H.; Michelotti, M.D.; Riemer, N.

2016-10-01

Stochastic particle-resolved methods have proven useful for simulating multi-dimensional systems such as composition-resolved aerosol size distributions. While particle-resolved methods have substantial benefits for highly detailed simulations, these techniques suffer from high computational cost, motivating efforts to improve their algorithmic efficiency. Here we formulate an algorithm for accelerating particle removal processes by aggregating particles of similar size into bins. We present the Binned Algorithm for particle removal processes and analyze its performance with application to the atmospherically relevant process of aerosol dry deposition. We show that the Binned Algorithm can dramatically improve the efficiency of particle removals, particularly for low removalmore » rates, and that computational cost is reduced without introducing additional error. In simulations of aerosol particle removal by dry deposition in atmospherically relevant conditions, we demonstrate about 50-times increase in algorithm efficiency.« less
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E.

2013-05-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill.; Figure 1 -- Architecture.
MaMR: High-performance MapReduce programming model for material cloud applications

NASA Astrophysics Data System (ADS)

Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng

2017-02-01

With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.
Cache and energy efficient algorithms for Nussinov's RNA Folding.

PubMed

Zhao, Chunchun; Sahni, Sartaj

2017-12-06

An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less
Computerized Bone Age Estimation Using Deep Learning Based Program: Evaluation of the Accuracy and Efficiency.

PubMed

Kim, Jeong Rye; Shim, Woo Hyun; Yoon, Hee Mang; Hong, Sang Hyup; Lee, Jin Seong; Cho, Young Ah; Kim, Sangki

2017-12-01

The purpose of this study is to evaluate the accuracy and efficiency of a new automatic software system for bone age assessment and to validate its feasibility in clinical practice. A Greulich-Pyle method-based deep-learning technique was used to develop the automatic software system for bone age determination. Using this software, bone age was estimated from left-hand radiographs of 200 patients (3-17 years old) using first-rank bone age (software only), computer-assisted bone age (two radiologists with software assistance), and Greulich-Pyle atlas-assisted bone age (two radiologists with Greulich-Pyle atlas assistance only). The reference bone age was determined by the consensus of two experienced radiologists. First-rank bone ages determined by the automatic software system showed a 69.5% concordance rate and significant correlations with the reference bone age (r = 0.992; p < 0.001). Concordance rates increased with the use of the automatic software system for both reviewer 1 (63.0% for Greulich-Pyle atlas-assisted bone age vs 72.5% for computer-assisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas-assisted bone age vs 57.5% for computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers 1 and 2, respectively. Automatic software system showed reliably accurate bone age estimations and appeared to enhance efficiency by reducing reading times without compromising the diagnostic accuracy.
Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

PubMed Central

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

PubMed

Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

2015-01-01

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling

DOE PAGES

Mo, Shaoxing; Lu, Dan; Shi, Xiaoqing; ...

2017-12-27

Global sensitivity analysis (GSA) and uncertainty quantification (UQ) for groundwater modeling are challenging because of the model complexity and significant computational requirements. To reduce the massive computational cost, a cheap-to-evaluate surrogate model is usually constructed to approximate and replace the expensive groundwater models in the GSA and UQ. Constructing an accurate surrogate requires actual model simulations on a number of parameter samples. Thus, a robust experimental design strategy is desired to locate informative samples so as to reduce the computational cost in surrogate construction and consequently to improve the efficiency in the GSA and UQ. In this study, we developmore » a Taylor expansion-based adaptive design (TEAD) that aims to build an accurate global surrogate model with a small training sample size. TEAD defines a novel hybrid score function to search informative samples, and a robust stopping criterion to terminate the sample search that guarantees the resulted approximation errors satisfy the desired accuracy. The good performance of TEAD in building global surrogate models is demonstrated in seven analytical functions with different dimensionality and complexity in comparison to two widely used experimental design methods. The application of the TEAD-based surrogate method in two groundwater models shows that the TEAD design can effectively improve the computational efficiency of GSA and UQ for groundwater modeling.« less
A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mo, Shaoxing; Lu, Dan; Shi, Xiaoqing

Global sensitivity analysis (GSA) and uncertainty quantification (UQ) for groundwater modeling are challenging because of the model complexity and significant computational requirements. To reduce the massive computational cost, a cheap-to-evaluate surrogate model is usually constructed to approximate and replace the expensive groundwater models in the GSA and UQ. Constructing an accurate surrogate requires actual model simulations on a number of parameter samples. Thus, a robust experimental design strategy is desired to locate informative samples so as to reduce the computational cost in surrogate construction and consequently to improve the efficiency in the GSA and UQ. In this study, we developmore » a Taylor expansion-based adaptive design (TEAD) that aims to build an accurate global surrogate model with a small training sample size. TEAD defines a novel hybrid score function to search informative samples, and a robust stopping criterion to terminate the sample search that guarantees the resulted approximation errors satisfy the desired accuracy. The good performance of TEAD in building global surrogate models is demonstrated in seven analytical functions with different dimensionality and complexity in comparison to two widely used experimental design methods. The application of the TEAD-based surrogate method in two groundwater models shows that the TEAD design can effectively improve the computational efficiency of GSA and UQ for groundwater modeling.« less

Heuristic Scheduling in Grid Environments: Reducing the Operational Energy Demand

NASA Astrophysics Data System (ADS)

Bodenstein, Christian

In a world where more and more businesses seem to trade in an online market, the supply of online services to the ever-growing demand could quickly reach its capacity limits. Online service providers may find themselves maxed out at peak operation levels during high-traffic timeslots but too little demand during low-traffic timeslots, although the latter is becoming less frequent. At this point deciding which user is allocated what level of service becomes essential. The concept of Grid computing could offer a meaningful alternative to conventional super-computing centres. Not only can Grids reach the same computing speeds as some of the fastest supercomputers, but distributed computing harbors a great energy-saving potential. When scheduling projects in such a Grid environment however, simply assigning one process to a system becomes so complex in calculation that schedules are often too late to execute, rendering their optimizations useless. Current schedulers attempt to maximize the utility, given some sort of constraint, often reverting to heuristics. This optimization often comes at the cost of environmental impact, in this case CO 2 emissions. This work proposes an alternate model of energy efficient scheduling while keeping a respectable amount of economic incentives untouched. Using this model, it is possible to reduce the total energy consumed by a Grid environment using 'just-in-time' flowtime management, paired with ranking nodes by efficiency.
A Polynomial Subset-Based Efficient Multi-Party Key Management System for Lightweight Device Networks.

PubMed

Mahmood, Zahid; Ning, Huansheng; Ghafoor, AtaUllah

2017-03-24

Wireless Sensor Networks (WSNs) consist of lightweight devices to measure sensitive data that are highly vulnerable to security attacks due to their constrained resources. In a similar manner, the internet-based lightweight devices used in the Internet of Things (IoT) are facing severe security and privacy issues because of the direct accessibility of devices due to their connection to the internet. Complex and resource-intensive security schemes are infeasible and reduce the network lifetime. In this regard, we have explored the polynomial distribution-based key establishment schemes and identified an issue that the resultant polynomial value is either storage intensive or infeasible when large values are multiplied. It becomes more costly when these polynomials are regenerated dynamically after each node join or leave operation and whenever key is refreshed. To reduce the computation, we have proposed an Efficient Key Management (EKM) scheme for multiparty communication-based scenarios. The proposed session key management protocol is established by applying a symmetric polynomial for group members, and the group head acts as a responsible node. The polynomial generation method uses security credentials and secure hash function. Symmetric cryptographic parameters are efficient in computation, communication, and the storage required. The security justification of the proposed scheme has been completed by using Rubin logic, which guarantees that the protocol attains mutual validation and session key agreement property strongly among the participating entities. Simulation scenarios are performed using NS 2.35 to validate the results for storage, communication, latency, energy, and polynomial calculation costs during authentication, session key generation, node migration, secure joining, and leaving phases. EKM is efficient regarding storage, computation, and communication overhead and can protect WSN-based IoT infrastructure.
A Polynomial Subset-Based Efficient Multi-Party Key Management System for Lightweight Device Networks

PubMed Central

Mahmood, Zahid; Ning, Huansheng; Ghafoor, AtaUllah

2017-01-01

Wireless Sensor Networks (WSNs) consist of lightweight devices to measure sensitive data that are highly vulnerable to security attacks due to their constrained resources. In a similar manner, the internet-based lightweight devices used in the Internet of Things (IoT) are facing severe security and privacy issues because of the direct accessibility of devices due to their connection to the internet. Complex and resource-intensive security schemes are infeasible and reduce the network lifetime. In this regard, we have explored the polynomial distribution-based key establishment schemes and identified an issue that the resultant polynomial value is either storage intensive or infeasible when large values are multiplied. It becomes more costly when these polynomials are regenerated dynamically after each node join or leave operation and whenever key is refreshed. To reduce the computation, we have proposed an Efficient Key Management (EKM) scheme for multiparty communication-based scenarios. The proposed session key management protocol is established by applying a symmetric polynomial for group members, and the group head acts as a responsible node. The polynomial generation method uses security credentials and secure hash function. Symmetric cryptographic parameters are efficient in computation, communication, and the storage required. The security justification of the proposed scheme has been completed by using Rubin logic, which guarantees that the protocol attains mutual validation and session key agreement property strongly among the participating entities. Simulation scenarios are performed using NS 2.35 to validate the results for storage, communication, latency, energy, and polynomial calculation costs during authentication, session key generation, node migration, secure joining, and leaving phases. EKM is efficient regarding storage, computation, and communication overhead and can protect WSN-based IoT infrastructure. PMID:28338632
Binary video codec for data reduction in wireless visual sensor networks

NASA Astrophysics Data System (ADS)

Khursheed, Khursheed; Ahmad, Naeem; Imran, Muhammad; O'Nils, Mattias

2013-02-01

Wireless Visual Sensor Networks (WVSN) is formed by deploying many Visual Sensor Nodes (VSNs) in the field. Typical applications of WVSN include environmental monitoring, health care, industrial process monitoring, stadium/airports monitoring for security reasons and many more. The energy budget in the outdoor applications of WVSN is limited to the batteries and the frequent replacement of batteries is usually not desirable. So the processing as well as the communication energy consumption of the VSN needs to be optimized in such a way that the network remains functional for longer duration. The images captured by VSN contain huge amount of data and require efficient computational resources for processing the images and wide communication bandwidth for the transmission of the results. Image processing algorithms must be designed and developed in such a way that they are computationally less complex and must provide high compression rate. For some applications of WVSN, the captured images can be segmented into bi-level images and hence bi-level image coding methods will efficiently reduce the information amount in these segmented images. But the compression rate of the bi-level image coding methods is limited by the underlined compression algorithm. Hence there is a need for designing other intelligent and efficient algorithms which are computationally less complex and provide better compression rate than that of bi-level image coding methods. Change coding is one such algorithm which is computationally less complex (require only exclusive OR operations) and provide better compression efficiency compared to image coding but it is effective for applications having slight changes between adjacent frames of the video. The detection and coding of the Region of Interest (ROIs) in the change frame efficiently reduce the information amount in the change frame. But, if the number of objects in the change frames is higher than a certain level then the compression efficiency of both the change coding and ROI coding becomes worse than that of image coding. This paper explores the compression efficiency of the Binary Video Codec (BVC) for the data reduction in WVSN. We proposed to implement all the three compression techniques i.e. image coding, change coding and ROI coding at the VSN and then select the smallest bit stream among the results of the three compression techniques. In this way the compression performance of the BVC will never become worse than that of image coding. We concluded that the compression efficiency of BVC is always better than that of change coding and is always better than or equal that of ROI coding and image coding.
Perspectives of IT Professionals on Employing Server Virtualization Technologies

ERIC Educational Resources Information Center

Sligh, Darla

2010-01-01

Server virtualization enables a physical computer to support multiple applications logically by decoupling the application from the hardware layer, thereby reducing operational costs and competitive in delivering IT services to their enterprise organizations. IT organizations continually examine the efficiency of their internal IT systems and…
An all-at-once reduced Hessian SQP scheme for aerodynamic design optimization

NASA Technical Reports Server (NTRS)

Feng, Dan; Pulliam, Thomas H.

1995-01-01

This paper introduces a computational scheme for solving a class of aerodynamic design problems that can be posed as nonlinear equality constrained optimizations. The scheme treats the flow and design variables as independent variables, and solves the constrained optimization problem via reduced Hessian successive quadratic programming. It updates the design and flow variables simultaneously at each iteration and allows flow variables to be infeasible before convergence. The solution of an adjoint flow equation is never needed. In addition, a range space basis is chosen so that in a certain sense the 'cross term' ignored in reduced Hessian SQP methods is minimized. Numerical results for a nozzle design using the quasi-one-dimensional Euler equations show that this scheme is computationally efficient and robust. The computational cost of a typical nozzle design is only a fraction more than that of the corresponding analysis flow calculation. Superlinear convergence is also observed, which agrees with the theoretical properties of this scheme. All optimal solutions are obtained by starting far away from the final solution.
Data-driven reduced order models for effective yield strength and partitioning of strain in multiphase materials

NASA Astrophysics Data System (ADS)

Latypov, Marat I.; Kalidindi, Surya R.

2017-10-01

There is a critical need for the development and verification of practically useful multiscale modeling strategies for simulating the mechanical response of multiphase metallic materials with heterogeneous microstructures. In this contribution, we present data-driven reduced order models for effective yield strength and strain partitioning in such microstructures. These models are built employing the recently developed framework of Materials Knowledge Systems that employ 2-point spatial correlations (or 2-point statistics) for the quantification of the heterostructures and principal component analyses for their low-dimensional representation. The models are calibrated to a large collection of finite element (FE) results obtained for a diverse range of microstructures with various sizes, shapes, and volume fractions of the phases. The performance of the models is evaluated by comparing the predictions of yield strength and strain partitioning in two-phase materials with the corresponding predictions from a classical self-consistent model as well as results of full-field FE simulations. The reduced-order models developed in this work show an excellent combination of accuracy and computational efficiency, and therefore present an important advance towards computationally efficient microstructure-sensitive multiscale modeling frameworks.
A path integral methodology for obtaining thermodynamic properties of nonadiabatic systems using Gaussian mixture distributions

NASA Astrophysics Data System (ADS)

Raymond, Neil; Iouchtchenko, Dmitri; Roy, Pierre-Nicholas; Nooijen, Marcel

2018-05-01

We introduce a new path integral Monte Carlo method for investigating nonadiabatic systems in thermal equilibrium and demonstrate an approach to reducing stochastic error. We derive a general path integral expression for the partition function in a product basis of continuous nuclear and discrete electronic degrees of freedom without the use of any mapping schemes. We separate our Hamiltonian into a harmonic portion and a coupling portion; the partition function can then be calculated as the product of a Monte Carlo estimator (of the coupling contribution to the partition function) and a normalization factor (that is evaluated analytically). A Gaussian mixture model is used to evaluate the Monte Carlo estimator in a computationally efficient manner. Using two model systems, we demonstrate our approach to reduce the stochastic error associated with the Monte Carlo estimator. We show that the selection of the harmonic oscillators comprising the sampling distribution directly affects the efficiency of the method. Our results demonstrate that our path integral Monte Carlo method's deviation from exact Trotter calculations is dominated by the choice of the sampling distribution. By improving the sampling distribution, we can drastically reduce the stochastic error leading to lower computational cost.
An efficient approach for improving virtual machine placement in cloud computing environment

NASA Astrophysics Data System (ADS)

Ghobaei-Arani, Mostafa; Shamsi, Mahboubeh; Rahmanian, Ali A.

2017-11-01

The ever increasing demand for the cloud services requires more data centres. The power consumption in the data centres is a challenging problem for cloud computing, which has not been considered properly by the data centre developer companies. Especially, large data centres struggle with the power cost and the Greenhouse gases production. Hence, employing the power efficient mechanisms are necessary to optimise the mentioned effects. Moreover, virtual machine (VM) placement can be used as an effective method to reduce the power consumption in data centres. In this paper by grouping both virtual and physical machines, and taking into account the maximum absolute deviation during the VM placement, the power consumption as well as the service level agreement (SLA) deviation in data centres are reduced. To this end, the best-fit decreasing algorithm is utilised in the simulation to reduce the power consumption by about 5% compared to the modified best-fit decreasing algorithm, and at the same time, the SLA violation is improved by 6%. Finally, the learning automata are used to a trade-off between power consumption reduction from one side, and SLA violation percentage from the other side.
Computational and experimental study of airflow around a fan powered UVGI lamp

NASA Astrophysics Data System (ADS)

Kaligotla, Srikar; Tavakoli, Behtash; Glauser, Mark; Ahmadi, Goodarz

2011-11-01

The quality of indoor air environment is very important for improving the health of occupants and reducing personal exposure to hazardous pollutants. An effective way of controlling air quality is by eliminating the airborne bacteria and viruses or by reducing their emissions. Ultraviolet Germicidal Irradiation (UVGI) lamps can effectively reduce these bio-contaminants in an indoor environment, but the efficiency of these systems depends on airflow in and around the device. UVGI lamps would not be as effective in stagnant environments as they would be when the moving air brings the bio-contaminant in their irradiation region. Introducing a fan into the UVGI system would augment the efficiency of the system's kill rate. Airflows in ventilated spaces are quite complex due to the vast range of length and velocity scales. The purpose of this research is to study these complex airflows using CFD techniques and validate computational model with airflow measurements around the device using Particle Image Velocimetry measurements. The experimental results including mean velocities, length scales and RMS values of fluctuating velocities are used in the CFD validation. Comparison of these data at different locations around the device with the CFD model predictions are performed and good agreement was observed.
A novel patient-specific model to compute coronary fractional flow reserve.

PubMed

Kwon, Soon-Sung; Chung, Eui-Chul; Park, Jin-Seo; Kim, Gook-Tae; Kim, Jun-Woo; Kim, Keun-Hong; Shin, Eun-Seok; Shim, Eun Bo

2014-09-01

The fractional flow reserve (FFR) is a widely used clinical index to evaluate the functional severity of coronary stenosis. A computer simulation method based on patients' computed tomography (CT) data is a plausible non-invasive approach for computing the FFR. This method can provide a detailed solution for the stenosed coronary hemodynamics by coupling computational fluid dynamics (CFD) with the lumped parameter model (LPM) of the cardiovascular system. In this work, we have implemented a simple computational method to compute the FFR. As this method uses only coronary arteries for the CFD model and includes only the LPM of the coronary vascular system, it provides simpler boundary conditions for the coronary geometry and is computationally more efficient than existing approaches. To test the efficacy of this method, we simulated a three-dimensional straight vessel using CFD coupled with the LPM. The computed results were compared with those of the LPM. To validate this method in terms of clinically realistic geometry, a patient-specific model of stenosed coronary arteries was constructed from CT images, and the computed FFR was compared with clinically measured results. We evaluated the effect of a model aorta on the computed FFR and compared this with a model without the aorta. Computationally, the model without the aorta was more efficient than that with the aorta, reducing the CPU time required for computing a cardiac cycle to 43.4%. Copyright © 2014. Published by Elsevier Ltd.
Spatial aliasing for efficient direction-of-arrival estimation based on steering vector reconstruction

NASA Astrophysics Data System (ADS)

Yan, Feng-Gang; Cao, Bin; Rong, Jia-Jia; Shen, Yi; Jin, Ming

2016-12-01

A new technique is proposed to reduce the computational complexity of the multiple signal classification (MUSIC) algorithm for direction-of-arrival (DOA) estimate using a uniform linear array (ULA). The steering vector of the ULA is reconstructed as the Kronecker product of two other steering vectors, and a new cost function with spatial aliasing at hand is derived. Thanks to the estimation ambiguity of this spatial aliasing, mirror angles mathematically relating to the true DOAs are generated, based on which the full spectral search involved in the MUSIC algorithm is highly compressed into a limited angular sector accordingly. Further complexity analysis and performance studies are conducted by computer simulations, which demonstrate that the proposed estimator requires an extremely reduced computational burden while it shows a similar accuracy to the standard MUSIC.
Localized basis functions and other computational improvements in variational nonorthogonal basis function methods for quantum mechanical scattering problems involving chemical reactions

NASA Technical Reports Server (NTRS)

Schwenke, David W.; Truhlar, Donald G.

1990-01-01

The Generalized Newton Variational Principle for 3D quantum mechanical reactive scattering is briefly reviewed. Then three techniques are described which improve the efficiency of the computations. First, the fact that the Hamiltonian is Hermitian is used to reduce the number of integrals computed, and then the properties of localized basis functions are exploited in order to eliminate redundant work in the integral evaluation. A new type of localized basis function with desirable properties is suggested. It is shown how partitioned matrices can be used with localized basis functions to reduce the amount of work required to handle the complex boundary conditions. The new techniques do not introduce any approximations into the calculations, so they may be used to obtain converged solutions of the Schroedinger equation.
A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations

NASA Technical Reports Server (NTRS)

Dydson, Roger W.; Goodrich, John W.

2000-01-01

Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.
A pressure flux-split technique for computation of inlet flow behavior

NASA Technical Reports Server (NTRS)

Pordal, H. S.; Khosla, P. K.; Rubin, S. G.

1991-01-01

A method for calculating the flow field in aircraft engine inlets is presented. The phenomena of inlet unstart and restart are investigated. Solutions of the reduced Navier-Stokes (RNS) equations are obtained with a time consistent direct sparse matrix solver that computes the transient flow field both internal and external to the inlet. Time varying shocks and time varying recirculation regions can be efficiently analyzed. The code is quite general and is suitable for the computation of flow for a wide variety of geometries and over a wide range of Mach and Reynolds numbers.
Method for simultaneous overlapped communications between neighboring processors in a multiple

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1991-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Computation in generalised probabilisitic theories

NASA Astrophysics Data System (ADS)

Lee, Ciarán M.; Barrett, Jonathan

2015-08-01

From the general difficulty of simulating quantum systems using classical systems, and in particular the existence of an efficient quantum algorithm for factoring, it is likely that quantum computation is intrinsically more powerful than classical computation. At present, the best upper bound known for the power of quantum computation is that {{BQP}}\\subseteq {{AWPP}}, where {{AWPP}} is a classical complexity class (known to be included in {{PP}}, hence {{PSPACE}}). This work investigates limits on computational power that are imposed by simple physical, or information theoretic, principles. To this end, we define a circuit-based model of computation in a class of operationally-defined theories more general than quantum theory, and ask: what is the minimal set of physical assumptions under which the above inclusions still hold? We show that given only an assumption of tomographic locality (roughly, that multipartite states and transformations can be characterized by local measurements), efficient computations are contained in {{AWPP}}. This inclusion still holds even without assuming a basic notion of causality (where the notion is, roughly, that probabilities for outcomes cannot depend on future measurement choices). Following Aaronson, we extend the computational model by allowing post-selection on measurement outcomes. Aaronson showed that the corresponding quantum complexity class, {{PostBQP}}, is equal to {{PP}}. Given only the assumption of tomographic locality, the inclusion in {{PP}} still holds for post-selected computation in general theories. Hence in a world with post-selection, quantum theory is optimal for computation in the space of all operational theories. We then consider whether one can obtain relativized complexity results for general theories. It is not obvious how to define a sensible notion of a computational oracle in the general framework that reduces to the standard notion in the quantum case. Nevertheless, it is possible to define computation relative to a ‘classical oracle’. Then, we show there exists a classical oracle relative to which efficient computation in any theory satisfying the causality assumption does not include {{NP}}.
Improving multivariate Horner schemes with Monte Carlo tree search

NASA Astrophysics Data System (ADS)

Kuipers, J.; Plaat, A.; Vermaseren, J. A. M.; van den Herik, H. J.

2013-11-01

Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two.
Efficient approach to obtain free energy gradient using QM/MM MD simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Asada, Toshio; Koseki, Shiro; The Research Institute for Molecular Electronic Devices

2015-12-31

The efficient computational approach denoted as charge and atom dipole response kernel (CDRK) model to consider polarization effects of the quantum mechanical (QM) region is described using the charge response and the atom dipole response kernels for free energy gradient (FEG) calculations in the quantum mechanical/molecular mechanical (QM/MM) method. CDRK model can reasonably reproduce energies and also energy gradients of QM and MM atoms obtained by expensive QM/MM calculations in a drastically reduced computational time. This model is applied on the acylation reaction in hydrated trypsin-BPTI complex to optimize the reaction path on the free energy surface by means ofmore » FEG and the nudged elastic band (NEB) method.« less
Proper Orthogonal Decomposition in Optimal Control of Fluids

NASA Technical Reports Server (NTRS)

Ravindran, S. S.

1999-01-01

In this article, we present a reduced order modeling approach suitable for active control of fluid dynamical systems based on proper orthogonal decomposition (POD). The rationale behind the reduced order modeling is that numerical simulation of Navier-Stokes equations is still too costly for the purpose of optimization and control of unsteady flows. We examine the possibility of obtaining reduced order models that reduce computational complexity associated with the Navier-Stokes equations while capturing the essential dynamics by using the POD. The POD allows extraction of certain optimal set of basis functions, perhaps few, from a computational or experimental data-base through an eigenvalue analysis. The solution is then obtained as a linear combination of these optimal set of basis functions by means of Galerkin projection. This makes it attractive for optimal control and estimation of systems governed by partial differential equations. We here use it in active control of fluid flows governed by the Navier-Stokes equations. We show that the resulting reduced order model can be very efficient for the computations of optimization and control problems in unsteady flows. Finally, implementational issues and numerical experiments are presented for simulations and optimal control of fluid flow through channels.

Joint histogram-based cost aggregation for stereo matching.

PubMed

Min, Dongbo; Lu, Jiangbo; Do, Minh N

2013-10-01

This paper presents a novel method for performing efficient cost aggregation in stereo matching. The cost aggregation problem is reformulated from the perspective of a histogram, giving us the potential to reduce the complexity of the cost aggregation in stereo matching significantly. Differently from previous methods which have tried to reduce the complexity in terms of the size of an image and a matching window, our approach focuses on reducing the computational redundancy that exists among the search range, caused by a repeated filtering for all the hypotheses. Moreover, we also reduce the complexity of the window-based filtering through an efficient sampling scheme inside the matching window. The tradeoff between accuracy and complexity is extensively investigated by varying the parameters used in the proposed method. Experimental results show that the proposed method provides high-quality disparity maps with low complexity and outperforms existing local methods. This paper also provides new insights into complexity-constrained stereo-matching algorithm design.
An Efficient Scheme for Updating Sparse Cholesky Factors

NASA Technical Reports Server (NTRS)

Raghavan, Padma

2002-01-01

Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.
Spatial compression algorithm for the analysis of very large multivariate images

DOEpatents

Keenan, Michael R [Albuquerque, NM

2008-07-15

A method for spatially compressing data sets enables the efficient analysis of very large multivariate images. The spatial compression algorithms use a wavelet transformation to map an image into a compressed image containing a smaller number of pixels that retain the original image's information content. Image analysis can then be performed on a compressed data matrix consisting of a reduced number of significant wavelet coefficients. Furthermore, a block algorithm can be used for performing common operations more efficiently. The spatial compression algorithms can be combined with spectral compression algorithms to provide further computational efficiencies.
Modeling the effect of shroud contact and friction dampers on the mistuned response of turbopumps

NASA Technical Reports Server (NTRS)

Griffin, Jerry H.; Yang, M.-T.

1994-01-01

The contract has been revised. Under the revised scope of work a reduced order model has been developed that can be used to predict the steady-state response of mistuned bladed disks. The approach has been implemented in a computer code, LMCC. It is concluded that: the reduced order model displays structural fidelity comparable to that of a finite element model of an entire bladed disk system with significantly improved computational efficiency; and, when the disk is stiff, both the finite element model and LMCC predict significantly more amplitude variation than was predicted by earlier models. This second result may have important practical ramifications, especially in the case of integrally bladed disks.
Hardware implementation of CMAC neural network with reduced storage requirement.

PubMed

Ker, J S; Kuo, Y H; Wen, R C; Liu, B D

1997-01-01

The cerebellar model articulation controller (CMAC) neural network has the advantages of fast convergence speed and low computation complexity. However, it suffers from a low storage space utilization rate on weight memory. In this paper, we propose a direct weight address mapping approach, which can reduce the required weight memory size with a utilization rate near 100%. Based on such an address mapping approach, we developed a pipeline architecture to efficiently perform the addressing operations. The proposed direct weight address mapping approach also speeds up the computation for the generation of weight addresses. Besides, a CMAC hardware prototype used for color calibration has been implemented to confirm the proposed approach and architecture.
Error analysis of multipoint flux domain decomposition methods for evolutionary diffusion problems

NASA Astrophysics Data System (ADS)

Arrarás, A.; Portero, L.; Yotov, I.

2014-01-01

We study space and time discretizations for mixed formulations of parabolic problems. The spatial approximation is based on the multipoint flux mixed finite element method, which reduces to an efficient cell-centered pressure system on general grids, including triangles, quadrilaterals, tetrahedra, and hexahedra. The time integration is performed by using a domain decomposition time-splitting technique combined with multiterm fractional step diagonally implicit Runge-Kutta methods. The resulting scheme is unconditionally stable and computationally efficient, as it reduces the global system to a collection of uncoupled subdomain problems that can be solved in parallel without the need for Schwarz-type iteration. Convergence analysis for both the semidiscrete and fully discrete schemes is presented.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing

PubMed Central

Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui

2017-01-01

Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
Improving Design Efficiency for Large-Scale Heterogeneous Circuits

NASA Astrophysics Data System (ADS)

Gregerson, Anthony

Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
A Pareto frontier intersection-based approach for efficient multiobjective optimization of competing concept alternatives

NASA Astrophysics Data System (ADS)

Rousis, Damon A.

The expected growth of civil aviation over the next twenty years places significant emphasis on revolutionary technology development aimed at mitigating the environmental impact of commercial aircraft. As the number of technology alternatives grows along with model complexity, current methods for Pareto finding and multiobjective optimization quickly become computationally infeasible. Coupled with the large uncertainty in the early stages of design, optimal designs are sought while avoiding the computational burden of excessive function calls when a single design change or technology assumption could alter the results. This motivates the need for a robust and efficient evaluation methodology for quantitative assessment of competing concepts. This research presents a novel approach that combines Bayesian adaptive sampling with surrogate-based optimization to efficiently place designs near Pareto frontier intersections of competing concepts. Efficiency is increased over sequential multiobjective optimization by focusing computational resources specifically on the location in the design space where optimality shifts between concepts. At the intersection of Pareto frontiers, the selection decisions are most sensitive to preferences place on the objectives, and small perturbations can lead to vastly different final designs. These concepts are incorporated into an evaluation methodology that ultimately reduces the number of failed cases, infeasible designs, and Pareto dominated solutions across all concepts. A set of algebraic samples along with a truss design problem are presented as canonical examples for the proposed approach. The methodology is applied to the design of ultra-high bypass ratio turbofans to guide NASA's technology development efforts for future aircraft. Geared-drive and variable geometry bypass nozzle concepts are explored as enablers for increased bypass ratio and potential alternatives over traditional configurations. The method is shown to improve sampling efficiency and provide clusters of feasible designs that motivate a shift towards revolutionary technologies that reduce fuel burn, emissions, and noise on future aircraft.
In-Process Items on LCS.

ERIC Educational Resources Information Center

Russell, Thyra K.

Morris Library at Southern Illinois University computerized its technical processes using the Library Computer System (LCS), which was implemented in the library to streamline order processing by: (1) providing up-to-date online files to track in-process items; (2) encouraging quick, efficient accessing of information; (3) reducing manual files;…
The multifacet graphically contracted function method. I. Formulation and implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shepard, Ron; Brozell, Scott R.; Gidofalvi, Gergely

2014-08-14

The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that bothmore » the energy and the gradient computation scale as O(N{sup 2}n{sup 4}) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N{sub 2} dissociation, cubic H{sub 8} dissociation, the symmetric dissociation of H{sub 2}O, and the insertion of Be into H{sub 2}. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.« less
A comparative study of serial and parallel aeroelastic computations of wings

NASA Technical Reports Server (NTRS)

Byun, Chansup; Guruswamy, Guru P.

1994-01-01

A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
A model and variance reduction method for computing statistical outputs of stochastic elliptic partial differential equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk

We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less
Parallel-vector out-of-core equation solver for computational mechanics

NASA Technical Reports Server (NTRS)

Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.

1993-01-01

A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
Fuel Efficient Demonstrator (FED) Alpha In-dash Vehicle Display Transition to Apple iPad for Reduced SWAP-C

DTIC Science & Technology

2012-10-01

VarTech Ruggedized Computer Features The IDVD consisted of a VarTech brand ruggedized computer within a NEMA enclosure (VARTEC VTPC104PHB). Optional...range 9 to 36V Rechargeable Battery Power (10 hrs, 1 month in standby) Gain in Function Resolution 1024 x 768 1024 x 768 Even NEMA Enclosure NEMA ...there is some functionality loss associated with that. The display is not easily readable in the sunlight, it does not have a weatherproof NEMA
Foundational Aero Research for Development of Efficient Power Turbines With 50% Variable-speed Capability

DTIC Science & Technology

2011-02-01

expected, with increased loading (or reduced axial -chord to pitch ratio for a given turning). In addition to minimizing design-point loss due to...5 Figure 2. Computed loading diagrams and Reynolds lapse rates for aft- (L1A) and mid- loaded (L1M) LPT blading (Clark et al., 2009...reference 22 in Welch, 2010) accomplishing the same 95° flow turning at high aerodynamic loading (Z = 1.34). .................8 Figure 3. Computed 2-D
Cubic spline numerical solution of an ablation problem with convective backface cooling

NASA Astrophysics Data System (ADS)

Lin, S.; Wang, P.; Kahawita, R.

1984-08-01

An implicit numerical technique using cubic splines is presented for solving an ablation problem on a thin wall with convective cooling. A non-uniform computational mesh with 6 grid points has been used for the numerical integration. The method has been found to be computationally efficient, providing for the care under consideration of an overall error of about 1 percent. The results obtained indicate that the convective cooling is an important factor in reducing the ablation thickness.
An efficient Bayesian data-worth analysis using a multilevel Monte Carlo method

NASA Astrophysics Data System (ADS)

Lu, Dan; Ricciuto, Daniel; Evans, Katherine

2018-03-01

Improving the understanding of subsurface systems and thus reducing prediction uncertainty requires collection of data. As the collection of subsurface data is costly, it is important that the data collection scheme is cost-effective. Design of a cost-effective data collection scheme, i.e., data-worth analysis, requires quantifying model parameter, prediction, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface hydrological model simulations using standard Monte Carlo (MC) sampling or surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose an efficient Bayesian data-worth analysis using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce computational costs using multifidelity approximations. Since the Bayesian data-worth analysis involves a great deal of expectation estimation, the cost saving of the MLMC in the assessment can be outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it for a highly heterogeneous two-phase subsurface flow simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the standard MC estimation. But compared to the standard MC, the MLMC greatly reduces the computational costs.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE PAGES

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less

Comparison of Mixing Characteristics for Several Fuel Injectors on an Open Plate and in a Ducted Flowpath Configuration at Hypervelocity Flow Conditions

NASA Technical Reports Server (NTRS)

Drozda, Tomasz G.; Shenoy, Rajiv R.; Passe, Bradley J.; Baurle, Robert A.; Drummond, J. Philip

2017-01-01

In order to reduce the cost and complexity associated with fuel injection and mixing experiments for high-speed flows, and to further enable optical access to the test section for nonintrusive diagnostics, the Enhanced Injection and Mixing Project (EIMP) utilizes an open flat plate configuration to characterize inert mixing properties of various fuel injectors for hypervelocity applications. The experiments also utilize reduced total temperature conditions to alleviate the need for hardware cooling. The use of "cold" flows and non-reacting mixtures for mixing experiments is not new, and has been extensively utilized as a screening technique for scramjet fuel injectors. The impact of reduced facility-air total temperature, and the use of inert fuel simulants, such as helium, on the mixing character of the flow has been assessed in previous numerical studies by the authors. Mixing performance was characterized for three different injectors: a strut, a ramp, and a flushwall. The present study focuses on the impact of using an open plate to approximate mixing in the duct. Toward this end, Reynolds-averaged simulations (RAS) were performed for the three fuel injectors in an open plate configuration and in a duct. The mixing parameters of interest, such as mixing efficiency and total pressure recovery, are then computed and compared for the two configurations. In addition to mixing efficiency and total pressure recovery, the combustion efficiency and thrust potential are also computed for the reacting simulations.
Active Flash: Out-of-core Data Analytics on Flash Storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boboila, Simona; Kim, Youngjae; Vazhkudai, Sudharshan S

2012-01-01

Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines bymore » migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.« less
Model-Invariant Hybrid Computations of Separated Flows for RCA Standard Test Cases

NASA Technical Reports Server (NTRS)

Woodruff, Stephen

2016-01-01

NASA's Revolutionary Computational Aerosciences (RCA) subproject has identified several smooth-body separated flows as standard test cases to emphasize the challenge these flows present for computational methods and their importance to the aerospace community. Results of computations of two of these test cases, the NASA hump and the FAITH experiment, are presented. The computations were performed with the model-invariant hybrid LES-RANS formulation, implemented in the NASA code VULCAN-CFD. The model- invariant formulation employs gradual LES-RANS transitions and compensation for model variation to provide more accurate and efficient hybrid computations. Comparisons revealed that the LES-RANS transitions employed in these computations were sufficiently gradual that the compensating terms were unnecessary. Agreement with experiment was achieved only after reducing the turbulent viscosity to mitigate the effect of numerical dissipation. The stream-wise evolution of peak Reynolds shear stress was employed as a measure of turbulence dynamics in separated flows useful for evaluating computations.
Hidden Markov induced Dynamic Bayesian Network for recovering time evolving gene regulatory networks

NASA Astrophysics Data System (ADS)

Zhu, Shijia; Wang, Yadong

2015-12-01

Dynamic Bayesian Networks (DBN) have been widely used to recover gene regulatory relationships from time-series data in computational systems biology. Its standard assumption is ‘stationarity’, and therefore, several research efforts have been recently proposed to relax this restriction. However, those methods suffer from three challenges: long running time, low accuracy and reliance on parameter settings. To address these problems, we propose a novel non-stationary DBN model by extending each hidden node of Hidden Markov Model into a DBN (called HMDBN), which properly handles the underlying time-evolving networks. Correspondingly, an improved structural EM algorithm is proposed to learn the HMDBN. It dramatically reduces searching space, thereby substantially improving computational efficiency. Additionally, we derived a novel generalized Bayesian Information Criterion under the non-stationary assumption (called BWBIC), which can help significantly improve the reconstruction accuracy and largely reduce over-fitting. Moreover, the re-estimation formulas for all parameters of our model are derived, enabling us to avoid reliance on parameter settings. Compared to the state-of-the-art methods, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation efficiency, even with no prior knowledge and parameter settings.
Non-conforming finite-element formulation for cardiac electrophysiology: an effective approach to reduce the computation time of heart simulations without compromising accuracy

NASA Astrophysics Data System (ADS)

Hurtado, Daniel E.; Rojas, Guillermo

2018-04-01

Computer simulations constitute a powerful tool for studying the electrical activity of the human heart, but computational effort remains prohibitively high. In order to recover accurate conduction velocities and wavefront shapes, the mesh size in linear element (Q1) formulations cannot exceed 0.1 mm. Here we propose a novel non-conforming finite-element formulation for the non-linear cardiac electrophysiology problem that results in accurate wavefront shapes and lower mesh-dependance in the conduction velocity, while retaining the same number of global degrees of freedom as Q1 formulations. As a result, coarser discretizations of cardiac domains can be employed in simulations without significant loss of accuracy, thus reducing the overall computational effort. We demonstrate the applicability of our formulation in biventricular simulations using a coarse mesh size of ˜ 1 mm, and show that the activation wave pattern closely follows that obtained in fine-mesh simulations at a fraction of the computation time, thus improving the accuracy-efficiency trade-off of cardiac simulations.
Self-consistent clustering analysis: an efficient multiscale scheme for inelastic heterogeneous materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Z.; Bessa, M. A.; Liu, W.K.

A predictive computational theory is shown for modeling complex, hierarchical materials ranging from metal alloys to polymer nanocomposites. The theory can capture complex mechanisms such as plasticity and failure that span across multiple length scales. This general multiscale material modeling theory relies on sound principles of mathematics and mechanics, and a cutting-edge reduced order modeling method named self-consistent clustering analysis (SCA) [Zeliang Liu, M.A. Bessa, Wing Kam Liu, “Self-consistent clustering analysis: An efficient multi-scale scheme for inelastic heterogeneous materials,” Comput. Methods Appl. Mech. Engrg. 306 (2016) 319–341]. SCA reduces by several orders of magnitude the computational cost of micromechanical andmore » concurrent multiscale simulations, while retaining the microstructure information. This remarkable increase in efficiency is achieved with a data-driven clustering method. Computationally expensive operations are performed in the so-called offline stage, where degrees of freedom (DOFs) are agglomerated into clusters. The interaction tensor of these clusters is computed. In the online or predictive stage, the Lippmann-Schwinger integral equation is solved cluster-wise using a self-consistent scheme to ensure solution accuracy and avoid path dependence. To construct a concurrent multiscale model, this scheme is applied at each material point in a macroscale structure, replacing a conventional constitutive model with the average response computed from the microscale model using just the SCA online stage. A regularized damage theory is incorporated in the microscale that avoids the mesh and RVE size dependence that commonly plagues microscale damage calculations. The SCA method is illustrated with two cases: a carbon fiber reinforced polymer (CFRP) structure with the concurrent multiscale model and an application to fatigue prediction for additively manufactured metals. For the CFRP problem, a speed up estimated to be about 43,000 is achieved by using the SCA method, as opposed to FE2, enabling the solution of an otherwise computationally intractable problem. The second example uses a crystal plasticity constitutive law and computes the fatigue potency of extrinsic microscale features such as voids. This shows that local stress and strain are capture sufficiently well by SCA. This model has been incorporated in a process-structure-properties prediction framework for process design in additive manufacturing.« less
Fast local reconstruction by selective backprojection for low dose in dental computed tomography

NASA Astrophysics Data System (ADS)

Yan, Bin; Deng, Lin; Han, Yu; Zhang, Feng; Wang, Xian-Chao; Li, Lei

2014-10-01

The high radiation dose in computed tomography (CT) scans increases the lifetime risk of cancer, which becomes a major clinical concern. The backprojection-filtration (BPF) algorithm could reduce the radiation dose by reconstructing the images from truncated data in a short scan. In a dental CT, it could reduce the radiation dose for the teeth by using the projection acquired in a short scan, and could avoid irradiation to the other part by using truncated projection. However, the limit of integration for backprojection varies per PI-line, resulting in low calculation efficiency and poor parallel performance. Recently, a tent BPF has been proposed to improve the calculation efficiency by rearranging the projection. However, the memory-consuming data rebinning process is included. Accordingly, the selective BPF (S-BPF) algorithm is proposed in this paper. In this algorithm, the derivative of the projection is backprojected to the points whose x coordinate is less than that of the source focal spot to obtain the differentiated backprojection. The finite Hilbert inverse is then applied to each PI-line segment. S-BPF avoids the influence of the variable limit of integration by selective backprojection without additional time cost or memory cost. The simulation experiment and the real experiment demonstrated the higher reconstruction efficiency of S-BPF.
An Efficient Neural-Network-Based Microseismic Monitoring Platform for Hydraulic Fracture on an Edge Computing Architecture.

PubMed

Zhang, Xiaopu; Lin, Jun; Chen, Zubin; Sun, Feng; Zhu, Xi; Fang, Gengfa

2018-06-05

Microseismic monitoring is one of the most critical technologies for hydraulic fracturing in oil and gas production. To detect events in an accurate and efficient way, there are two major challenges. One challenge is how to achieve high accuracy due to a poor signal-to-noise ratio (SNR). The other one is concerned with real-time data transmission. Taking these challenges into consideration, an edge-computing-based platform, namely Edge-to-Center LearnReduce, is presented in this work. The platform consists of a data center with many edge components. At the data center, a neural network model combined with convolutional neural network (CNN) and long short-term memory (LSTM) is designed and this model is trained by using previously obtained data. Once the model is fully trained, it is sent to edge components for events detection and data reduction. At each edge component, a probabilistic inference is added to the neural network model to improve its accuracy. Finally, the reduced data is delivered to the data center. Based on experiment results, a high detection accuracy (over 96%) with less transmitted data (about 90%) was achieved by using the proposed approach on a microseismic monitoring system. These results show that the platform can simultaneously improve the accuracy and efficiency of microseismic monitoring.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Goodrich, John W.; Dyson, Rodger W.

1999-01-01

The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.
NASA HPCC Technology for Aerospace Analysis and Design

NASA Technical Reports Server (NTRS)

Schulbach, Catherine H.

1999-01-01

The Computational Aerosciences (CAS) Project is part of NASA's High Performance Computing and Communications Program. Its primary goal is to accelerate the availability of high-performance computing technology to the US aerospace community-thus providing the US aerospace community with key tools necessary to reduce design cycle times and increase fidelity in order to improve safety, efficiency and capability of future aerospace vehicles. A complementary goal is to hasten the emergence of a viable commercial market within the aerospace community for the advantage of the domestic computer hardware and software industry. The CAS Project selects representative aerospace problems (especially design) and uses them to focus efforts on advancing aerospace algorithms and applications, systems software, and computing machinery to demonstrate vast improvements in system performance and capability over the life of the program. Recent demonstrations have served to assess the benefits of possible performance improvements while reducing the risk of adopting high-performance computing technology. This talk will discuss past accomplishments in providing technology to the aerospace community, present efforts, and future goals. For example, the times to do full combustor and compressor simulations (of aircraft engines) have been reduced by factors of 320:1 and 400:1 respectively. While this has enabled new capabilities in engine simulation, the goal of an overnight, dynamic, multi-disciplinary, 3-dimensional simulation of an aircraft engine is still years away and will require new generations of high-end technology.
Lithography - Green and Getting Greener

NASA Astrophysics Data System (ADS)

Levinson, Harry J.

2011-06-01

Today, many energy-saving technologies and practices are enabled or made more effective through the use of nano-electronics. Such technologies include hybrid and all-electric cars, as well as controllers to increase the efficiency of photovoltaic panels. Telecommuting, which enables people to work without traveling from their homes, has been made possible by personal computers and the internet. Reducing the costs of nano-electronics will make possible increased opportunities for the use of products that reduce energy consumption. The most effective way to reduce costs is to improve efficiency. Increased efficiency also provides the benefit of reducing energy and material consumption in the manufacturing of nano-electronics. For example, reducing photochemical usage decreases costs but also reduces material consumption and the need for disposal. Reduction of scrap and rework are direct improvements in efficiency. Cycle time reduction enables greater responsiveness to demand, reducing the amount of material started in processing but never completed. Good process control reduces scrap and rework during manufacturing and results in circuits that have high performance, yet lower power consumption, when used. There are ready opportunities for making the most of the natural tendencies of businesses to innovate and improve efficiency. The semiconductor industry has historically adopted process improvements that have increased worker safety and reduced the consumption of hazardous materials. An early example was the transition from solvent to aqueous photoresist developers. Today, all types of development can be conducted in safer equipment that minimizes the release of hazardous chemicals to the air and water. Non-toxic solvents, such as ethyl lactate, have been widely adopted. There are many opportunities for further improvement. For example, over 90% of resist goes down the drain using conventional spin-coating process, so there is an opportunity for greatly improved efficiency in that operation. A lot of water is used to reduce defects when using chemically amplified resists, and the amount of water needed could be reduced by improved design of resists and substrate coatings. Thinking further into the future, directed self-assembly has the promise of a patterning technology that can be applied simply and with energy-efficiency. Once the fundamental challenges of creating high output extreme ultraviolet (EUV) light sources are overcome, there will be great opportunities for reducing electricity consumption.
An adaptive grid to improve the efficiency and accuracy of modelling underwater noise from shipping

NASA Astrophysics Data System (ADS)

Trigg, Leah; Chen, Feng; Shapiro, Georgy; Ingram, Simon; Embling, Clare

2017-04-01

Underwater noise from shipping is becoming a significant concern and has been listed as a pollutant under Descriptor 11 of the Marine Strategy Framework Directive. Underwater noise models are an essential tool to assess and predict noise levels for regulatory procedures such as environmental impact assessments and ship noise monitoring. There are generally two approaches to noise modelling. The first is based on simplified energy flux models, assuming either spherical or cylindrical propagation of sound energy. These models are very quick but they ignore important water column and seabed properties, and produce significant errors in the areas subject to temperature stratification (Shapiro et al., 2014). The second type of model (e.g. ray-tracing and parabolic equation) is based on an advanced physical representation of sound propagation. However, these acoustic propagation models are computationally expensive to execute. Shipping noise modelling requires spatial discretization in order to group noise sources together using a grid. A uniform grid size is often selected to achieve either the greatest efficiency (i.e. speed of computations) or the greatest accuracy. In contrast, this work aims to produce efficient and accurate noise level predictions by presenting an adaptive grid where cell size varies with distance from the receiver. The spatial range over which a certain cell size is suitable was determined by calculating the distance from the receiver at which propagation loss becomes uniform across a grid cell. The computational efficiency and accuracy of the resulting adaptive grid was tested by comparing it to uniform 1 km and 5 km grids. These represent an accurate and computationally efficient grid respectively. For a case study of the Celtic Sea, an application of the adaptive grid over an area of 160×160 km reduced the number of model executions required from 25600 for a 1 km grid to 5356 in December and to between 5056 and 13132 in August, which represents a 2 to 5-fold increase in efficiency. The 5 km grid reduces the number of model executions further to 1024. However, over the first 25 km the 5 km grid produces errors of up to 13.8 dB when compared to the highly accurate but inefficient 1 km grid. The newly developed adaptive grid generates much smaller errors of less than 0.5 dB while demonstrating high computational efficiency. Our results show that the adaptive grid provides the ability to retain the accuracy of noise level predictions and improve the efficiency of the modelling process. This can help safeguard sensitive marine ecosystems from noise pollution by improving the underwater noise predictions that inform management activities. References Shapiro, G., Chen, F., Thain, R., 2014. The Effect of Ocean Fronts on Acoustic Wave Propagation in a Shallow Sea, Journal of Marine System, 139: 217 - 226. http://dx.doi.org/10.1016/j.jmarsys.2014.06.007.
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal

NASA Astrophysics Data System (ADS)

Satheeskumaran, S.; Sabrigiriraj, M.

2016-06-01

Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case

PubMed Central

Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.

PubMed

Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.
Application of Local Discretization Methods in the NASA Finite-Volume General Circulation Model

NASA Technical Reports Server (NTRS)

Yeh, Kao-San; Lin, Shian-Jiann; Rood, Richard B.

2002-01-01

We present the basic ideas of the dynamics system of the finite-volume General Circulation Model developed at NASA Goddard Space Flight Center for climate simulations and other applications in meteorology. The dynamics of this model is designed with emphases on conservative and monotonic transport, where the property of Lagrangian conservation is used to maintain the physical consistency of the computational fluid for long-term simulations. As the model benefits from the noise-free solutions of monotonic finite-volume transport schemes, the property of Lagrangian conservation also partly compensates the accuracy of transport for the diffusion effects due to the treatment of monotonicity. By faithfully maintaining the fundamental laws of physics during the computation, this model is able to achieve sufficient accuracy for the global consistency of climate processes. Because the computing algorithms are based on local memory, this model has the advantage of efficiency in parallel computation with distributed memory. Further research is yet desirable to reduce the diffusion effects of monotonic transport for better accuracy, and to mitigate the limitation due to fast-moving gravity waves for better efficiency.
A one-model approach based on relaxed combinations of inputs for evaluating input congestion in DEA

NASA Astrophysics Data System (ADS)

Khodabakhshi, Mohammad

2009-08-01

This paper provides a one-model approach of input congestion based on input relaxation model developed in data envelopment analysis (e.g. [G.R. Jahanshahloo, M. Khodabakhshi, Suitable combination of inputs for improving outputs in DEA with determining input congestion -- Considering textile industry of China, Applied Mathematics and Computation (1) (2004) 263-273; G.R. Jahanshahloo, M. Khodabakhshi, Determining assurance interval for non-Archimedean ele improving outputs model in DEA, Applied Mathematics and Computation 151 (2) (2004) 501-506; M. Khodabakhshi, A super-efficiency model based on improved outputs in data envelopment analysis, Applied Mathematics and Computation 184 (2) (2007) 695-703; M. Khodabakhshi, M. Asgharian, An input relaxation measure of efficiency in stochastic data analysis, Applied Mathematical Modelling 33 (2009) 2010-2023]. This approach reduces solving three problems with the two-model approach introduced in the first of the above-mentioned reference to two problems which is certainly important from computational point of view. The model is applied to a set of data extracted from ISI database to estimate input congestion of 12 Canadian business schools.
Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo

DOE PAGES

Krogel, Jaron T.; Reboredo, Fernando A.

2018-01-25

Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this paper, we explore alternatives to reduce the memory usage of splined orbitalsmore » without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. Finally, for production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.« less
Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krogel, Jaron T.; Reboredo, Fernando A.

Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this paper, we explore alternatives to reduce the memory usage of splined orbitalsmore » without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. Finally, for production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.« less
Solving the hypersingular boundary integral equation in three-dimensional acoustics using a regularization relationship.

PubMed

Yan, Zai You; Hung, Kin Chew; Zheng, Hui

2003-05-01

Regularization of the hypersingular integral in the normal derivative of the conventional Helmholtz integral equation through a double surface integral method or regularization relationship has been studied. By introducing the new concept of discretized operator matrix, evaluation of the double surface integrals is reduced to calculate the product of two discretized operator matrices. Such a treatment greatly improves the computational efficiency. As the number of frequencies to be computed increases, the computational cost of solving the composite Helmholtz integral equation is comparable to that of solving the conventional Helmholtz integral equation. In this paper, the detailed formulation of the proposed regularization method is presented. The computational efficiency and accuracy of the regularization method are demonstrated for a general class of acoustic radiation and scattering problems. The radiation of a pulsating sphere, an oscillating sphere, and a rigid sphere insonified by a plane acoustic wave are solved using the new method with curvilinear quadrilateral isoparametric elements. It is found that the numerical results rapidly converge to the corresponding analytical solutions as finer meshes are applied.

Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo

NASA Astrophysics Data System (ADS)

Krogel, Jaron T.; Reboredo, Fernando A.

2018-01-01

Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this work, we explore alternatives to reduce the memory usage of splined orbitals without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. For production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.
A-VCI: A flexible method to efficiently compute vibrational spectra

NASA Astrophysics Data System (ADS)

Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2017-06-01

The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm-1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm-1 is the most accurate computation that exists today on such systems.
A-VCI: A flexible method to efficiently compute vibrational spectra.

PubMed

Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2017-06-07

The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm -1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm -1 is the most accurate computation that exists today on such systems.
Clinical brain MR imaging prescriptions in Talairach space: technologist- and computer-driven methods.

PubMed

Weiss, Kenneth L; Pan, Hai; Storrs, Judd; Strub, William; Weiss, Jane L; Jia, Li; Eldevik, O Petter

2003-05-01

Variability in patient head positioning may yield substantial interstudy image variance in the clinical setting. We describe and test three-step technologist and computer-automated algorithms designed to image the brain in a standard reference system and reduce variance. Triple oblique axial images obtained parallel to the Talairach anterior commissure (AC)-posterior commissure (PC) plane were reviewed in a prospective analysis of 126 consecutive patients. Requisite roll, yaw, and pitch correction, as three authors determined independently and subsequently by consensus, were compared with the technologists' actual graphical prescriptions and those generated by a novel computer automated three-step (CATS) program. Automated pitch determinations generated with Statistical Parametric Mapping '99 (SPM'99) were also compared. Requisite pitch correction (15.2 degrees +/- 10.2 degrees ) far exceeded that for roll (-0.6 degrees +/- 3.7 degrees ) and yaw (-0.9 degrees +/- 4.7 degrees ) in terms of magnitude and variance (P <.001). Technologist and computer-generated prescriptions substantially reduced interpatient image variance with regard to roll (3.4 degrees and 3.9 degrees vs 13.5 degrees ), yaw (0.6 degrees and 2.5 degrees vs 22.3 degrees ), and pitch (28.6 degrees, 18.5 degrees with CATS, and 59.3 degrees with SPM'99 vs 104 degrees ). CATS performed worse than the technologists in yaw prescription, and it was equivalent in roll and pitch prescriptions. Talairach prescriptions better approximated standard CT canthomeatal angulations (9 degrees vs 24 degrees ) and provided more efficient brain coverage than that of routine axial imaging. Brain MR prescriptions corrected for direct roll, yaw, and Talairach AC-PC pitch can be readily achieved by trained technologists or automated computer algorithms. This ability will substantially reduce interpatient variance, allow better approximation of standard CT angulation, and yield more efficient brain coverage than that of routine clinical axial imaging.
REGENERATIVE TRANSISTOR AMPLIFIER

DOEpatents

Kabell, L.J.

1958-11-25

Electrical circults for use in computers and the like are described. particularly a regenerative bistable transistor amplifler which is iurned on by a clock signal when an information signal permits and is turned off by the clock signal. The amplifier porforms the above function with reduced power requirements for the clock signal and circuit operation. The power requirements are reduced in one way by employing transformer coupling which increases the collector circuit efficiency by eliminating the loss of power in the collector load resistor.
A Discrete Global Grid System Programming Language Using MapReduce

NASA Astrophysics Data System (ADS)

Peterson, P.; Shatz, I.

2016-12-01

A discrete global grid system (DGGS) is a powerful mechanism for storing and integrating geospatial information. As a "pixelization" of the Earth, many image processing techniques lend themselves to the transformation of data values referenced to the DGGS cells. It has been shown that image algebra, as an example, and advanced algebra, like Fast Fourier Transformation, can be used on the DGGS tiling structure for geoprocessing and spatial analysis. MapReduce has been shown to provide advantages for processing and generating large data sets within distributed and parallel computing. The DGGS structure is ideally suited for big distributed Earth data. We proposed that basic expressions could be created to form the atoms of a generalized DGGS language using the MapReduce programming model. We created three very efficient expressions: Selectors (aka filter) - A selection function that generate a set of cells, cell collections, or geometries; Calculators (aka map) - A computational function (including quantization of raw measurements and data sources) that generate values in a DGGS cell; and Aggregators (aka reduce) - A function that generate spatial statistics from cell values within a cell. We found that these three basic MapReduce operations along with a forth function, the Iterator, for horizontal and vertical traversing of any DGGS structure, provided simple building block resulting in very efficient operations and processes that could be used with any DGGS. We provide examples and a demonstration of their effectiveness using the ISEA3H DGGS on the PYXIS Studio.
An efficient and reliable geographic routing protocol based on partial network coding for underwater sensor networks.

PubMed

Hao, Kun; Jin, Zhigang; Shen, Haifeng; Wang, Ying

2015-05-28

Efficient routing protocols for data packet delivery are crucial to underwater sensor networks (UWSNs). However, communication in UWSNs is a challenging task because of the characteristics of the acoustic channel. Network coding is a promising technique for efficient data packet delivery thanks to the broadcast nature of acoustic channels and the relatively high computation capabilities of the sensor nodes. In this work, we present GPNC, a novel geographic routing protocol for UWSNs that incorporates partial network coding to encode data packets and uses sensor nodes' location information to greedily forward data packets to sink nodes. GPNC can effectively reduce network delays and retransmissions of redundant packets causing additional network energy consumption. Simulation results show that GPNC can significantly improve network throughput and packet delivery ratio, while reducing energy consumption and network latency when compared with other routing protocols.
Hybrid cloud and cluster computing paradigms for life science applications

PubMed Central

2010-01-01

Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.

PubMed

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

2010-12-21

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

PubMed

Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R

2016-06-01

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
An effective and secure key-management scheme for hierarchical access control in E-medicine system.

PubMed

Odelu, Vanga; Das, Ashok Kumar; Goswami, Adrijit

2013-04-01

Recently several hierarchical access control schemes are proposed in the literature to provide security of e-medicine systems. However, most of them are either insecure against 'man-in-the-middle attack' or they require high storage and computational overheads. Wu and Chen proposed a key management method to solve dynamic access control problems in a user hierarchy based on hybrid cryptosystem. Though their scheme improves computational efficiency over Nikooghadam et al.'s approach, it suffers from large storage space for public parameters in public domain and computational inefficiency due to costly elliptic curve point multiplication. Recently, Nikooghadam and Zakerolhosseini showed that Wu-Chen's scheme is vulnerable to man-in-the-middle attack. In order to remedy this security weakness in Wu-Chen's scheme, they proposed a secure scheme which is again based on ECC (elliptic curve cryptography) and efficient one-way hash function. However, their scheme incurs huge computational cost for providing verification of public information in the public domain as their scheme uses ECC digital signature which is costly when compared to symmetric-key cryptosystem. In this paper, we propose an effective access control scheme in user hierarchy which is only based on symmetric-key cryptosystem and efficient one-way hash function. We show that our scheme reduces significantly the storage space for both public and private domains, and computational complexity when compared to Wu-Chen's scheme, Nikooghadam-Zakerolhosseini's scheme, and other related schemes. Through the informal and formal security analysis, we further show that our scheme is secure against different attacks and also man-in-the-middle attack. Moreover, dynamic access control problems in our scheme are also solved efficiently compared to other related schemes, making our scheme is much suitable for practical applications of e-medicine systems.
A Practical, Robust Methodology for Acquiring New Observation Data Using Computationally Expensive Groundwater Models

NASA Astrophysics Data System (ADS)

Siade, Adam J.; Hall, Joel; Karelse, Robert N.

2017-11-01

Regional groundwater flow models play an important role in decision making regarding water resources; however, the uncertainty embedded in model parameters and model assumptions can significantly hinder the reliability of model predictions. One way to reduce this uncertainty is to collect new observation data from the field. However, determining where and when to obtain such data is not straightforward. There exist a number of data-worth and experimental design strategies developed for this purpose. However, these studies often ignore issues related to real-world groundwater models such as computational expense, existing observation data, high-parameter dimension, etc. In this study, we propose a methodology, based on existing methods and software, to efficiently conduct such analyses for large-scale, complex regional groundwater flow systems for which there is a wealth of available observation data. The method utilizes the well-established d-optimality criterion, and the minimax criterion for robust sampling strategies. The so-called Null-Space Monte Carlo method is used to reduce the computational burden associated with uncertainty quantification. And, a heuristic methodology, based on the concept of the greedy algorithm, is proposed for developing robust designs with subsets of the posterior parameter samples. The proposed methodology is tested on a synthetic regional groundwater model, and subsequently applied to an existing, complex, regional groundwater system in the Perth region of Western Australia. The results indicate that robust designs can be obtained efficiently, within reasonable computational resources, for making regional decisions regarding groundwater level sampling.
Approximate Analysis for Interlaminar Stresses in Composite Structures with Thickness Discontinuities

NASA Technical Reports Server (NTRS)

Rose, Cheryl A.; Starnes, James H., Jr.

1996-01-01

An efficient, approximate analysis for calculating complete three-dimensional stress fields near regions of geometric discontinuities in laminated composite structures is presented. An approximate three-dimensional local analysis is used to determine the detailed local response due to far-field stresses obtained from a global two-dimensional analysis. The stress results from the global analysis are used as traction boundary conditions for the local analysis. A generalized plane deformation assumption is made in the local analysis to reduce the solution domain to two dimensions. This assumption allows out-of-plane deformation to occur. The local analysis is based on the principle of minimum complementary energy and uses statically admissible stress functions that have an assumed through-the-thickness distribution. Examples are presented to illustrate the accuracy and computational efficiency of the local analysis. Comparisons of the results of the present local analysis with the corresponding results obtained from a finite element analysis and from an elasticity solution are presented. These results indicate that the present local analysis predicts the stress field accurately. Computer execution-times are also presented. The demonstrated accuracy and computational efficiency of the analysis make it well suited for parametric and design studies.
Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi

PubMed Central

Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil

2012-01-01

The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
Experimental and Computational Sonic Boom Assessment of Lockheed-Martin N+2 Low Boom Models

NASA Technical Reports Server (NTRS)

Cliff, Susan E.; Durston, Donald A.; Elmiligui, Alaa A.; Walker, Eric L.; Carter, Melissa B.

2015-01-01

Flight at speeds greater than the speed of sound is not permitted over land, primarily because of the noise and structural damage caused by sonic boom pressure waves of supersonic aircraft. Mitigation of sonic boom is a key focus area of the High Speed Project under NASA's Fundamental Aeronautics Program. The project is focusing on technologies to enable future civilian aircraft to fly efficiently with reduced sonic boom, engine and aircraft noise, and emissions. A major objective of the project is to improve both computational and experimental capabilities for design of low-boom, high-efficiency aircraft. NASA and industry partners are developing improved wind tunnel testing techniques and new pressure instrumentation to measure the weak sonic boom pressure signatures of modern vehicle concepts. In parallel, computational methods are being developed to provide rapid design and analysis of supersonic aircraft with improved meshing techniques that provide efficient, robust, and accurate on- and off-body pressures at several body lengths from vehicles with very low sonic boom overpressures. The maturity of these critical parallel efforts is necessary before low-boom flight can be demonstrated and commercial supersonic flight can be realized.
Machine vision for real time orbital operations

NASA Technical Reports Server (NTRS)

Vinz, Frank L.

1988-01-01

Machine vision for automation and robotic operation of Space Station era systems has the potential for increasing the efficiency of orbital servicing, repair, assembly and docking tasks. A machine vision research project is described in which a TV camera is used for inputing visual data to a computer so that image processing may be achieved for real time control of these orbital operations. A technique has resulted from this research which reduces computer memory requirements and greatly increases typical computational speed such that it has the potential for development into a real time orbital machine vision system. This technique is called AI BOSS (Analysis of Images by Box Scan and Syntax).
Turbocharged Diesels

NASA Technical Reports Server (NTRS)

1984-01-01

In a number of feasibility studies of turbine rotor designs, engineers of Cummins Engine Company, Inc.'s turbocharger group have utilized a computer program from COSMIC. Part of Cummins research effort is aimed toward introduction of advanced turbocharged engines that deliver extra power with greater fuel efficiency. Company claims use of COSMIC program substantially reduced software development costs.
An Efficient Algorithm for TUCKALS3 on Data with Large Numbers of Observation Units.

ERIC Educational Resources Information Center

Kiers, Henk A. L.; And Others

1992-01-01

A modification of the TUCKALS3 algorithm is proposed that handles three-way arrays of order I x J x K for any I. The reduced work space needed for storing data and increased execution speed make the modified algorithm very suitable for use on personal computers. (SLD)
Cloud computing-based TagSNP selection algorithm for human genome data.

PubMed

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-05

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

Hilbert, E. E.

1977-01-01

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

Water droplet impingement on airfoils and aircraft engine inlets for icing analysis

NASA Technical Reports Server (NTRS)

Papadakis, Michael; Elangovan, R.; Freund, George A., Jr.; Breer, Marlin D.

1991-01-01

This paper includes the results of a significant research program for verification of computer trajectory codes used in aircraft icing analysis. Experimental water droplet impingement data have been obtained in the NASA Lewis Research Center Icing Research Tunnel for a wide range of aircraft geometries and test conditions. The body whose impingement characteristics are required is covered at strategic locations by thin strips of moisture absorbing (blotter) paper and then exposed to an airstream containing a dyed-water spray cloud. Water droplet impingement data are extracted from the dyed blotter strips by measuring the optical reflectance of the dye deposit on the strips with an automated reflectometer. Impingement characteristics for all test geometries have also been calculated using two recently developed trajectory computer codes. Good agreement is obtained with experimental data. The experimental and analytical data show that maximum impingement efficiency and impingement limits increase with mean volumetric diameter for all geometries tested. For all inlet geometries tested, as the inlet mass flow is reduced, the maximum impingement efficiency is reduced and the location of the maximum impingement shifts toward the inlet inner cowl.
Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

PubMed Central

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-01

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
A computationally efficient ductile damage model accounting for nucleation and micro-inertia at high triaxialities

DOE PAGES

Versino, Daniele; Bronkhorst, Curt Allan

2018-01-31

The computational formulation of a micro-mechanical material model for the dynamic failure of ductile metals is presented in this paper. The statistical nature of porosity initiation is accounted for by introducing an arbitrary probability density function which describes the pores nucleation pressures. Each micropore within the representative volume element is modeled as a thick spherical shell made of plastically incompressible material. The treatment of porosity by a distribution of thick-walled spheres also allows for the inclusion of micro-inertia effects under conditions of shock and dynamic loading. The second order ordinary differential equation governing the microscopic porosity evolution is solved withmore » a robust implicit procedure. A new Chebyshev collocation method is employed to approximate the porosity distribution and remapping is used to optimize memory usage. The adaptive approximation of the porosity distribution leads to a reduction of computational time and memory usage of up to two orders of magnitude. Moreover, the proposed model affords consistent performance: changing the nucleation pressure probability density function and/or the applied strain rate does not reduce accuracy or computational efficiency of the material model. The numerical performance of the model and algorithms presented is tested against three problems for high density tantalum: single void, one-dimensional uniaxial strain, and two-dimensional plate impact. Here, the results using the integration and algorithmic advances suggest a significant improvement in computational efficiency and accuracy over previous treatments for dynamic loading conditions.« less
Buffer management for sequential decoding. [block erasure probability reduction

NASA Technical Reports Server (NTRS)

Layland, J. W.

1974-01-01

Sequential decoding has been found to be an efficient means of communicating at low undetected error rates from deep space probes, but erasure or computational overflow remains a significant problem. Erasure of a block occurs when the decoder has not finished decoding that block at the time that it must be output. By drawing upon analogies in computer time sharing, this paper develops a buffer-management strategy which reduces the decoder idle time to a negligible level, and therefore improves the erasure probability of a sequential decoder. For a decoder with a speed advantage of ten and a buffer size of ten blocks, operating at an erasure rate of .01, use of this buffer-management strategy reduces the erasure rate to less than .0001.
A multilevel-skin neighbor list algorithm for molecular dynamics simulation

NASA Astrophysics Data System (ADS)

Zhang, Chenglong; Zhao, Mingcan; Hou, Chaofeng; Ge, Wei

2018-01-01

Searching of the interaction pairs and organization of the interaction processes are important steps in molecular dynamics (MD) algorithms and are critical to the overall efficiency of the simulation. Neighbor lists are widely used for these steps, where thicker skin can reduce the frequency of list updating but is discounted by more computation in distance check for the particle pairs. In this paper, we propose a new neighbor-list-based algorithm with a precisely designed multilevel skin which can reduce unnecessary computation on inter-particle distances. The performance advantages over traditional methods are then analyzed against the main simulation parameters on Intel CPUs and MICs (many integrated cores), and are clearly demonstrated. The algorithm can be generalized for various discrete simulations using neighbor lists.
Decomposed multidimensional control grid interpolation for common consumer electronic image processing applications

NASA Astrophysics Data System (ADS)

Zwart, Christine M.; Venkatesan, Ragav; Frakes, David H.

2012-10-01

Interpolation is an essential and broadly employed function of signal processing. Accordingly, considerable development has focused on advancing interpolation algorithms toward optimal accuracy. Such development has motivated a clear shift in the state-of-the art from classical interpolation to more intelligent and resourceful approaches, registration-based interpolation for example. As a natural result, many of the most accurate current algorithms are highly complex, specific, and computationally demanding. However, the diverse hardware destinations for interpolation algorithms present unique constraints that often preclude use of the most accurate available options. For example, while computationally demanding interpolators may be suitable for highly equipped image processing platforms (e.g., computer workstations and clusters), only more efficient interpolators may be practical for less well equipped platforms (e.g., smartphones and tablet computers). The latter examples of consumer electronics present a design tradeoff in this regard: high accuracy interpolation benefits the consumer experience but computing capabilities are limited. It follows that interpolators with favorable combinations of accuracy and efficiency are of great practical value to the consumer electronics industry. We address multidimensional interpolation-based image processing problems that are common to consumer electronic devices through a decomposition approach. The multidimensional problems are first broken down into multiple, independent, one-dimensional (1-D) interpolation steps that are then executed with a newly modified registration-based one-dimensional control grid interpolator. The proposed approach, decomposed multidimensional control grid interpolation (DMCGI), combines the accuracy of registration-based interpolation with the simplicity, flexibility, and computational efficiency of a 1-D interpolation framework. Results demonstrate that DMCGI provides improved interpolation accuracy (and other benefits) in image resizing, color sample demosaicing, and video deinterlacing applications, at a computational cost that is manageable or reduced in comparison to popular alternatives.
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Capture Efficiency of Biocompatible Magnetic Nanoparticles in Arterial Flow: A Computer Simulation for Magnetic Drug Targeting.

PubMed

Lunnoo, Thodsaphon; Puangmali, Theerapong

2015-12-01

The primary limitation of magnetic drug targeting (MDT) relates to the strength of an external magnetic field which decreases with increasing distance. Small nanoparticles (NPs) displaying superparamagnetic behaviour are also required in order to reduce embolization in the blood vessel. The small NPs, however, make it difficult to vector NPs and keep them in the desired location. The aims of this work were to investigate parameters influencing the capture efficiency of the drug carriers in mimicked arterial flow. In this work, we computationally modelled and evaluated capture efficiency in MDT with COMSOL Multiphysics 4.4. The studied parameters were (i) magnetic nanoparticle size, (ii) three classes of magnetic cores (Fe3O4, Fe2O3, and Fe), and (iii) the thickness of biocompatible coating materials (Au, SiO2, and PEG). It was found that the capture efficiency of small particles decreased with decreasing size and was less than 5 % for magnetic particles in the superparamagnetic regime. The thickness of non-magnetic coating materials did not significantly influence the capture efficiency of MDT. It was difficult to capture small drug carriers (D<200 nm) in the arterial flow. We suggest that the MDT with high-capture efficiency can be obtained in small vessels and low-blood velocities such as micro-capillary vessels.
An effective support system of emergency medical services with tablet computers.

PubMed

Yamada, Kosuke C; Inoue, Satoshi; Sakamoto, Yuichiro

2015-02-27

There were over 5,000,000 ambulance dispatches during 2010 in Japan, and the time for transportation has been increasing, it took over 37 minutes from dispatch to the hospitals. A way to reduce transportation time by ambulance is to shorten the time of searching for an appropriate facility/hospital during the prehospital phase. Although the information system of medical institutions and emergency medical service (EMS) was established in 2003 in Saga Prefecture, Japan, it has not been utilized efficiently. The Saga Prefectural Government renewed the previous system in an effort to make it the real-time support system that can efficiently manage emergency demand and acceptance for the first time in Japan in April 2011. The objective of this study was to evaluate if the new system promotes efficient emergency transportation for critically ill patients and provides valuable epidemiological data. The new system has provided both emergency personnel in the ambulance, or at the scene, and the medical staff in each hospital to be able to share up-to-date information about available hospitals by means of cloud computing. All 55 ambulances in Saga are equipped with tablet computers through third generation/long term evolution networks. When the emergency personnel arrive on the scene and discern the type of patient's illness, they can search for an appropriate facility/hospital with their tablet computer based on the patient's symptoms and available medical specialists. Data were collected prospectively over a three-year period from April 1, 2011 to March 31, 2013. The transportation time by ambulance in Saga was shortened for the first time since the statistics were first kept in 1999; the mean time was 34.3 minutes in 2010 (based on administrative statistics) and 33.9 minutes (95% CI 33.6-34.1) in 2011. The ratio of transportation to the tertiary care facilities in Saga has decreased by 3.12% from the year before, 32.7% in 2010 (regional average) and 29.58% (9085/30,709) in 2011. The system entry completion rate by the emergency personnel was 100.00% (93,110/93,110) and by the medical staff was 46.11% (14,159/30,709) to 47.57% (14,639/30,772) over a three-year period. Finally, the new system reduced the operational costs by 40,000,000 yen (about $400,000 US dollars) a year. The transportation time by ambulance was shorter following the implementation of the tablet computer in the current support system of EMS in Saga Prefecture, Japan. The cloud computing reduced the cost of the EMS system.
An Efficient Moving Target Detection Algorithm Based on Sparsity-Aware Spectrum Estimation

PubMed Central

Shen, Mingwei; Wang, Jie; Wu, Di; Zhu, Daiyin

2014-01-01

In this paper, an efficient direct data domain space-time adaptive processing (STAP) algorithm for moving targets detection is proposed, which is achieved based on the distinct spectrum features of clutter and target signals in the angle-Doppler domain. To reduce the computational complexity, the high-resolution angle-Doppler spectrum is obtained by finding the sparsest coefficients in the angle domain using the reduced-dimension data within each Doppler bin. Moreover, we will then present a knowledge-aided block-size detection algorithm that can discriminate between the moving targets and the clutter based on the extracted spectrum features. The feasibility and effectiveness of the proposed method are validated through both numerical simulations and raw data processing results. PMID:25222035
Adaptive Crack Modeling with Interface Solid Elements for Plain and Fiber Reinforced Concrete Structures.

PubMed

Zhan, Yijian; Meschke, Günther

2017-07-08

The effective analysis of the nonlinear behavior of cement-based engineering structures not only demands physically-reliable models, but also computationally-efficient algorithms. Based on a continuum interface element formulation that is suitable to capture complex cracking phenomena in concrete materials and structures, an adaptive mesh processing technique is proposed for computational simulations of plain and fiber-reinforced concrete structures to progressively disintegrate the initial finite element mesh and to add degenerated solid elements into the interfacial gaps. In comparison with the implementation where the entire mesh is processed prior to the computation, the proposed adaptive cracking model allows simulating the failure behavior of plain and fiber-reinforced concrete structures with remarkably reduced computational expense.
Adaptive Crack Modeling with Interface Solid Elements for Plain and Fiber Reinforced Concrete Structures

PubMed Central

Zhan, Yijian

2017-01-01

The effective analysis of the nonlinear behavior of cement-based engineering structures not only demands physically-reliable models, but also computationally-efficient algorithms. Based on a continuum interface element formulation that is suitable to capture complex cracking phenomena in concrete materials and structures, an adaptive mesh processing technique is proposed for computational simulations of plain and fiber-reinforced concrete structures to progressively disintegrate the initial finite element mesh and to add degenerated solid elements into the interfacial gaps. In comparison with the implementation where the entire mesh is processed prior to the computation, the proposed adaptive cracking model allows simulating the failure behavior of plain and fiber-reinforced concrete structures with remarkably reduced computational expense. PMID:28773130
Acoustic environmental accuracy requirements for response determination

NASA Technical Reports Server (NTRS)

Pettitt, M. R.

1983-01-01

A general purpose computer program was developed for the prediction of vehicle interior noise. This program, named VIN, has both modal and statistical energy analysis capabilities for structural/acoustic interaction analysis. The analytic models and their computer implementation were verified through simple test cases with well-defined experimental results. The model was also applied in a space shuttle payload bay launch acoustics prediction study. The computer program processes large and small problems with equal efficiency because all arrays are dynamically sized by program input variables at run time. A data base is built and easily accessed for design studies. The data base significantly reduces the computational costs of such studies by allowing the reuse of the still-valid calculated parameters of previous iterations.
Computing Galois Groups of Eisenstein Polynomials Over P-adic Fields

NASA Astrophysics Data System (ADS)

Milstead, Jonathan

The most efficient algorithms for computing Galois groups of polynomials over global fields are based on Stauduhar's relative resolvent method. These methods are not directly generalizable to the local field case, since they require a field that contains the global field in which all roots of the polynomial can be approximated. We present splitting field-independent methods for computing the Galois group of an Eisenstein polynomial over a p-adic field. Our approach is to combine information from different disciplines. We primarily, make use of the ramification polygon of the polynomial, which is the Newton polygon of a related polynomial. This allows us to quickly calculate several invariants that serve to reduce the number of possible Galois groups. Algorithms by Greve and Pauli very efficiently return the Galois group of polynomials where the ramification polygon consists of one segment as well as information about the subfields of the stem field. Second, we look at the factorization of linear absolute resolvents to further narrow the pool of possible groups.
Combined multi-spectrum and orthogonal Laplacianfaces for fast CB-XLCT imaging with single-view data

NASA Astrophysics Data System (ADS)

Zhang, Haibo; Geng, Guohua; Chen, Yanrong; Qu, Xuan; Zhao, Fengjun; Hou, Yuqing; Yi, Huangjian; He, Xiaowei

2017-12-01

Cone-beam X-ray luminescence computed tomography (CB-XLCT) is an attractive hybrid imaging modality, which has the potential of monitoring the metabolic processes of nanophosphors-based drugs in vivo. Single-view data reconstruction as a key issue of CB-XLCT imaging promotes the effective study of dynamic XLCT imaging. However, it suffers from serious ill-posedness in the inverse problem. In this paper, a multi-spectrum strategy is adopted to relieve the ill-posedness of reconstruction. The strategy is based on the third-order simplified spherical harmonic approximation model. Then, an orthogonal Laplacianfaces-based method is proposed to reduce the large computational burden without degrading the imaging quality. Both simulated data and in vivo experimental data were used to evaluate the efficiency and robustness of the proposed method. The results are satisfactory in terms of both location and quantitative recovering with computational efficiency, indicating that the proposed method is practical and promising for single-view CB-XLCT imaging.
Parallelization of the FLAPW method and comparison with the PPW method

NASA Astrophysics Data System (ADS)

Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

2000-03-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
A diagonal algorithm for the method of pseudocompressibility. [for steady-state solution to incompressible Navier-Stokes equation

NASA Technical Reports Server (NTRS)

Rogers, S. E.; Kwak, D.; Chang, J. L. C.

1986-01-01

The method of pseudocompressibility has been shown to be an efficient method for obtaining a steady-state solution to the incompressible Navier-Stokes equations. Recent improvements to this method include the use of a diagonal scheme for the inversion of the equations at each iteration. The necessary transformations have been derived for the pseudocompressibility equations in generalized coordinates. The diagonal algorithm reduces the computing time necessary to obtain a steady-state solution by a factor of nearly three. Implicit viscous terms are maintained in the equations, and it has become possible to use fourth-order implicit dissipation. The steady-state solution is unchanged by the approximations resulting from the diagonalization of the equations. Computed results for flow over a two-dimensional backward-facing step and a three-dimensional cylinder mounted normal to a flat plate are presented for both the old and new algorithms. The accuracy and computing efficiency of these algorithms are compared.
Regional-scale calculation of the LS factor using parallel processing

NASA Astrophysics Data System (ADS)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard

Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less
A power-efficient ZF precoding scheme for multi-user indoor visible light communication systems

NASA Astrophysics Data System (ADS)

Zhao, Qiong; Fan, Yangyu; Deng, Lijun; Kang, Bochao

2017-02-01

In this study, we propose a power-efficient ZF precoding scheme for visible light communication (VLC) downlink multi-user multiple-input-single-output (MU-MISO) systems, which incorporates the zero-forcing (ZF) and the characteristics of VLC systems. The main idea of this scheme is that the channel matrix used to perform pseudoinverse comes from the set of optical Access Points (APs) shared by more than one user, instead of the set of all involved serving APs as the existing ZF precoding schemes often used. By doing this, the waste of power, which is caused by the transmission of one user's data in the un-serving APs, can be avoided. In addition, the size of the channel matrix needs to perform pseudoinverse becomes smaller, which helps to reduce the computation complexity. Simulation results in two scenarios show that the proposed ZF precoding scheme has higher power efficiency, better bit error rate (BER) performance and lower computation complexity compared with traditional ZF precoding schemes.

Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

DOE PAGES

Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard; ...

2017-06-06

Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less
Computing Bounds on Resource Levels for Flexible Plans

NASA Technical Reports Server (NTRS)

Muscvettola, Nicola; Rijsman, David

2009-01-01

A new algorithm efficiently computes the tightest exact bound on the levels of resources induced by a flexible activity plan (see figure). Tightness of bounds is extremely important for computations involved in planning because tight bounds can save potentially exponential amounts of search (through early backtracking and detection of solutions), relative to looser bounds. The bound computed by the new algorithm, denoted the resource-level envelope, constitutes the measure of maximum and minimum consumption of resources at any time for all fixed-time schedules in the flexible plan. At each time, the envelope guarantees that there are two fixed-time instantiations one that produces the minimum level and one that produces the maximum level. Therefore, the resource-level envelope is the tightest possible resource-level bound for a flexible plan because any tighter bound would exclude the contribution of at least one fixed-time schedule. If the resource- level envelope can be computed efficiently, one could substitute looser bounds that are currently used in the inner cores of constraint-posting scheduling algorithms, with the potential for great improvements in performance. What is needed to reduce the cost of computation is an algorithm, the measure of complexity of which is no greater than a low-degree polynomial in N (where N is the number of activities). The new algorithm satisfies this need. In this algorithm, the computation of resource-level envelopes is based on a novel combination of (1) the theory of shortest paths in the temporal-constraint network for the flexible plan and (2) the theory of maximum flows for a flow network derived from the temporal and resource constraints. The measure of asymptotic complexity of the algorithm is O(N O(maxflow(N)), where O(x) denotes an amount of computing time or a number of arithmetic operations proportional to a number of the order of x and O(maxflow(N)) is the measure of complexity (and thus of cost) of a maximumflow algorithm applied to an auxiliary flow network of 2N nodes. The algorithm is believed to be efficient in practice; experimental analysis shows the practical cost of maxflow to be as low as O(N1.5). The algorithm could be enhanced following at least two approaches. In the first approach, incremental subalgorithms for the computation of the envelope could be developed. By use of temporal scanning of the events in the temporal network, it may be possible to significantly reduce the size of the networks on which it is necessary to run the maximum-flow subalgorithm, thereby significantly reducing the time required for envelope calculation. In the second approach, the practical effectiveness of resource envelopes in the inner loops of search algorithms could be tested for multi-capacity resource scheduling. This testing would include inner-loop backtracking and termination tests and variable and value-ordering heuristics that exploit the properties of resource envelopes more directly.
Delivering better power: the role of simulation in reducing the environmental impact of aircraft engines.

PubMed

Menzies, Kevin

2014-08-13

The growth in simulation capability over the past 20 years has led to remarkable changes in the design process for gas turbines. The availability of relatively cheap computational power coupled to improvements in numerical methods and physical modelling in simulation codes have enabled the development of aircraft propulsion systems that are more powerful and yet more efficient than ever before. However, the design challenges are correspondingly greater, especially to reduce environmental impact. The simulation requirements to achieve a reduced environmental impact are described along with the implications of continued growth in available computational power. It is concluded that achieving the environmental goals will demand large-scale multi-disciplinary simulations requiring significantly increased computational power, to enable optimization of the airframe and propulsion system over the entire operational envelope. However even with massive parallelization, the limits imposed by communications latency will constrain the time required to achieve a solution, and therefore the position of such large-scale calculations in the industrial design process. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Estimation of Sonic Fatigue by Reduced-Order Finite Element Based Analyses

NASA Technical Reports Server (NTRS)

Rizzi, Stephen A.; Przekop, Adam

2006-01-01

A computationally efficient, reduced-order method is presented for prediction of sonic fatigue of structures exhibiting geometrically nonlinear response. A procedure to determine the nonlinear modal stiffness using commercial finite element codes allows the coupled nonlinear equations of motion in physical degrees of freedom to be transformed to a smaller coupled system of equations in modal coordinates. The nonlinear modal system is first solved using a computationally light equivalent linearization solution to determine if the structure responds to the applied loading in a nonlinear fashion. If so, a higher fidelity numerical simulation in modal coordinates is undertaken to more accurately determine the nonlinear response. Comparisons of displacement and stress response obtained from the reduced-order analyses are made with results obtained from numerical simulation in physical degrees-of-freedom. Fatigue life predictions from nonlinear modal and physical simulations are made using the rainflow cycle counting method in a linear cumulative damage analysis. Results computed for a simple beam structure under a random acoustic loading demonstrate the effectiveness of the approach and compare favorably with results obtained from the solution in physical degrees-of-freedom.
DOE/ NREL Build One of the World's Most Energy Efficient Office Spaces

ScienceCinema

Radocy, Rachel; Livingston, Brian; von Luhrte, Rich

2018-05-18

Technology â from sophisticated computer modeling to advanced windows that actually open â will help the newest building at the U.S. Department of Energy's (DOE) National Renewable Energy Laboratory (NREL) be one of the world's most energy efficient offices. Scheduled to open this summer, the 222,000 square-foot RSF will house more than 800 staff and an energy efficient information technology data center. Because 19 percent of the country's energy is used by commercial buildings, DOE plans to make this facility a showcase for energy efficiency. DOE hopes the design of the RSF will be replicated by the building industry and help reduce the nation's energy consumption by changing the way commercial buildings are designed and built.
Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

PubMed

Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

2014-01-01

A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

PubMed Central

Yazar, Seyhan; Gooden, George E. C.; Mackey, David A.; Hewitt, Alex W.

2014-01-01

A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298
An integrated system for land resources supervision based on the IoT and cloud computing

NASA Astrophysics Data System (ADS)

Fang, Shifeng; Zhu, Yunqiang; Xu, Lida; Zhang, Jinqu; Zhou, Peiji; Luo, Kan; Yang, Jie

2017-01-01

Integrated information systems are important safeguards for the utilisation and development of land resources. Information technologies, including the Internet of Things (IoT) and cloud computing, are inevitable requirements for the quality and efficiency of land resources supervision tasks. In this study, an economical and highly efficient supervision system for land resources has been established based on IoT and cloud computing technologies; a novel online and offline integrated system with synchronised internal and field data that includes the entire process of 'discovering breaches, analysing problems, verifying fieldwork and investigating cases' was constructed. The system integrates key technologies, such as the automatic extraction of high-precision information based on remote sensing, semantic ontology-based technology to excavate and discriminate public sentiment on the Internet that is related to illegal incidents, high-performance parallel computing based on MapReduce, uniform storing and compressing (bitwise) technology, global positioning system data communication and data synchronisation mode, intelligent recognition and four-level ('device, transfer, system and data') safety control technology. The integrated system based on a 'One Map' platform has been officially implemented by the Department of Land and Resources of Guizhou Province, China, and was found to significantly increase the efficiency and level of land resources supervision. The system promoted the overall development of informatisation in fields related to land resource management.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases

NASA Astrophysics Data System (ADS)

Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel

2017-07-01

We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) [13], we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Efficient Decoding With Steady-State Kalman Filter in Neural Interface Systems

PubMed Central

Malik, Wasim Q.; Truccolo, Wilson; Brown, Emery N.; Hochberg, Leigh R.

2011-01-01

The Kalman filter is commonly used in neural interface systems to decode neural activity and estimate the desired movement kinematics. We analyze a low-complexity Kalman filter implementation in which the filter gain is approximated by its steady-state form, computed offline before real-time decoding commences. We evaluate its performance using human motor cortical spike train data obtained from an intracortical recording array as part of an ongoing pilot clinical trial. We demonstrate that the standard Kalman filter gain converges to within 95% of the steady-state filter gain in 1.5 ± 0.5 s (mean ± s.d.). The difference in the intended movement velocity decoded by the two filters vanishes within 5 s, with a correlation coefficient of 0.99 between the two decoded velocities over the session length. We also find that the steady-state Kalman filter reduces the computational load (algorithm execution time) for decoding the firing rates of 25 ± 3 single units by a factor of 7.0 ± 0.9. We expect that the gain in computational efficiency will be much higher in systems with larger neural ensembles. The steady-state filter can thus provide substantial runtime efficiency at little cost in terms of estimation accuracy. This far more efficient neural decoding approach will facilitate the practical implementation of future large-dimensional, multisignal neural interface systems. PMID:21078582
Combined Numerical/Analytical Perturbation Solutions of the Navier-Stokes Equations for Aerodynamic Ejector/Mixer Nozzle Flows

NASA Technical Reports Server (NTRS)

DeChant, Lawrence Justin

1998-01-01

In spite of rapid advances in both scalar and parallel computational tools, the large number of variables involved in both design and inverse problems make the use of sophisticated fluid flow models impractical, With this restriction, it is concluded that an important family of methods for mathematical/computational development are reduced or approximate fluid flow models. In this study a combined perturbation/numerical modeling methodology is developed which provides a rigorously derived family of solutions. The mathematical model is computationally more efficient than classical boundary layer but provides important two-dimensional information not available using quasi-1-d approaches. An additional strength of the current methodology is its ability to locally predict static pressure fields in a manner analogous to more sophisticated parabolized Navier Stokes (PNS) formulations. To resolve singular behavior, the model utilizes classical analytical solution techniques. Hence, analytical methods have been combined with efficient numerical methods to yield an efficient hybrid fluid flow model. In particular, the main objective of this research has been to develop a system of analytical and numerical ejector/mixer nozzle models, which require minimal empirical input. A computer code, DREA Differential Reduced Ejector/mixer Analysis has been developed with the ability to run sufficiently fast so that it may be used either as a subroutine or called by an design optimization routine. Models are of direct use to the High Speed Civil Transport Program (a joint government/industry project seeking to develop an economically.viable U.S. commercial supersonic transport vehicle) and are currently being adopted by both NASA and industry. Experimental validation of these models is provided by comparison to results obtained from open literature and Limited Exclusive Right Distribution (LERD) sources, as well as dedicated experiments performed at Texas A&M. These experiments have been performed using a hydraulic/gas flow analog. Results of comparisons of DREA computations with experimental data, which include entrainment, thrust, and local profile information, are overall good. Computational time studies indicate that DREA provides considerably more information at a lower computational cost than contemporary ejector nozzle design models. Finally. physical limitations of the method, deviations from experimental data, potential improvements and alternative formulations are described. This report represents closure to the NASA Graduate Researchers Program. Versions of the DREA code and a user's guide may be obtained from the NASA Lewis Research Center.
A Parallel First-Order Linear Recurrence Solver.

DTIC Science & Technology

1986-09-01

1og2 -M)steps, but p M did not discuss any specific parallel implementation. Gajski [GAJ81] improved upon this result by performing the SIMD computation...solves a series of reduced recurrences of size p 2. However, when N = p 2, our approach reduces to that of I’- [GAJ81], except that Gajski presents the...existing SIMD algorithms to solve R<N,1>, the SIMD algo- rithm presented by Gajski [GAJ81] can be most efficiently mapped to a uni- directional ring
Reducing and Analyzing the PHAT Survey with the Cloud

NASA Astrophysics Data System (ADS)

Williams, Benjamin F.; Olsen, Knut; Khan, Rubab; Pirone, Daniel; Rosema, Keith

2018-05-01

We discuss the technical challenges we faced and the techniques we used to overcome them when reducing the Panchromatic Hubble Andromeda Treasury (PHAT) photometric data set on the Amazon Elastic Compute Cloud (EC2). We first describe the architecture of our photometry pipeline, which we found particularly efficient for reducing the data in multiple ways for different purposes. We then describe the features of EC2 that make this architecture both efficient to use and challenging to implement. We describe the techniques we adopted to process our data, and suggest ways these techniques may be improved for those interested in trying such reductions in the future. Finally, we summarize the output photometry data products, which are now hosted publicly in two places in two formats. They are in simple fits tables in the high-level science products on MAST, and on a queryable database available through the NOAO Data Lab.
Multispectral x-ray CT: multivariate statistical analysis for efficient reconstruction

NASA Astrophysics Data System (ADS)

Kheirabadi, Mina; Mustafa, Wail; Lyksborg, Mark; Lund Olsen, Ulrik; Bjorholm Dahl, Anders

2017-10-01

Recent developments in multispectral X-ray detectors allow for an efficient identification of materials based on their chemical composition. This has a range of applications including security inspection, which is our motivation. In this paper, we analyze data from a tomographic setup employing the MultiX detector, that records projection data in 128 energy bins covering the range from 20 to 160 keV. Obtaining all information from this data requires reconstructing 128 tomograms, which is computationally expensive. Instead, we propose to reduce the dimensionality of projection data prior to reconstruction and reconstruct from the reduced data. We analyze three linear methods for dimensionality reduction using a dataset with 37 equally-spaced projection angles. Four bottles with different materials are recorded for which we are able to obtain similar discrimination of their content using a very reduced subset of tomograms compared to the 128 tomograms that would otherwise be needed without dimensionality reduction.
Reduced Design Load Basis for Ultimate Blade Loads Estimation in Multidisciplinary Design Optimization Frameworks

NASA Astrophysics Data System (ADS)

Pavese, Christian; Tibaldi, Carlo; Larsen, Torben J.; Kim, Taeseong; Thomsen, Kenneth

2016-09-01

The aim is to provide a fast and reliable approach to estimate ultimate blade loads for a multidisciplinary design optimization (MDO) framework. For blade design purposes, the standards require a large amount of computationally expensive simulations, which cannot be efficiently run each cost function evaluation of an MDO process. This work describes a method that allows integrating the calculation of the blade load envelopes inside an MDO loop. Ultimate blade load envelopes are calculated for a baseline design and a design obtained after an iteration of an MDO. These envelopes are computed for a full standard design load basis (DLB) and a deterministic reduced DLB. Ultimate loads extracted from the two DLBs with the two blade designs each are compared and analyzed. Although the reduced DLB supplies ultimate loads of different magnitude, the shape of the estimated envelopes are similar to the one computed using the full DLB. This observation is used to propose a scheme that is computationally cheap, and that can be integrated inside an MDO framework, providing a sufficiently reliable estimation of the blade ultimate loading. The latter aspect is of key importance when design variables implementing passive control methodologies are included in the formulation of the optimization problem. An MDO of a 10 MW wind turbine blade is presented as an applied case study to show the efficacy of the reduced DLB concept.
Data compression strategies for ptychographic diffraction imaging

NASA Astrophysics Data System (ADS)

Loetgering, Lars; Rose, Max; Treffer, David; Vartanyants, Ivan A.; Rosenhahn, Axel; Wilhein, Thomas

2017-12-01

Ptychography is a computational imaging method for solving inverse scattering problems. To date, the high amount of redundancy present in ptychographic data sets requires computer memory that is orders of magnitude larger than the retrieved information. Here, we propose and compare data compression strategies that significantly reduce the amount of data required for wavefield inversion. Information metrics are used to measure the amount of data redundancy present in ptychographic data. Experimental results demonstrate the technique to be memory efficient and stable in the presence of systematic errors such as partial coherence and noise.
Virtualization in education: Information Security lab in your hands

NASA Astrophysics Data System (ADS)

Karlov, A. A.

2016-09-01

The growing demand for qualified specialists in advanced information technologies poses serious challenges to the education and training of young personnel for science, industry and social problems. Virtualization as a way to isolate the user from the physical characteristics of computing resources (processors, servers, operating systems, networks, applications, etc.), has, in particular, an enormous influence in the field of education, increasing its efficiency, reducing the cost, making it more widely and readily available. The study of Information Security of computer systems is considered as an example of use of virtualization in education.
New double-byte error-correcting codes for memory systems

NASA Technical Reports Server (NTRS)

Feng, Gui-Liang; Wu, Xinen; Rao, T. R. N.

1996-01-01

Error-correcting or error-detecting codes have been used in the computer industry to increase reliability, reduce service costs, and maintain data integrity. The single-byte error-correcting and double-byte error-detecting (SbEC-DbED) codes have been successfully used in computer memory subsystems. There are many methods to construct double-byte error-correcting (DBEC) codes. In the present paper we construct a class of double-byte error-correcting codes, which are more efficient than those known to be optimum, and a decoding procedure for our codes is also considered.
An efficient computer based wavelets approximation method to solve Fuzzy boundary value differential equations

NASA Astrophysics Data System (ADS)

Alam Khan, Najeeb; Razzaq, Oyoon Abdul

2016-03-01

In the present work a wavelets approximation method is employed to solve fuzzy boundary value differential equations (FBVDEs). Essentially, a truncated Legendre wavelets series together with the Legendre wavelets operational matrix of derivative are utilized to convert FB- VDE into a simple computational problem by reducing it into a system of fuzzy algebraic linear equations. The capability of scheme is investigated on second order FB- VDE considered under generalized H-differentiability. Solutions are represented graphically showing competency and accuracy of this method.
Dual-scale topology optoelectronic processor.

PubMed

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

Prediction of sound radiated from different practical jet engine inlets

NASA Technical Reports Server (NTRS)

Zinn, B. T.; Meyer, W. L.

1980-01-01

Existing computer codes for calculating the far field radiation patterns surrounding various practical jet engine inlet configurations under different excitation conditions were upgraded. The computer codes were refined and expanded so that they are now more efficient computationally by a factor of about three and they are now capable of producing accurate results up to nondimensional wave numbers of twenty. Computer programs were also developed to help generate accurate geometrical representations of the inlets to be investigated. This data is required as input for the computer programs which calculate the sound fields. This new geometry generating computer program considerably reduces the time required to generate the input data which was one of the most time consuming steps in the process. The results of sample runs using the NASA-Lewis QCSEE inlet are presented and comparison of run times and accuracy are made between the old and upgraded computer codes. The overall accuracy of the computations is determined by comparison of the results of the computations with simple source solutions.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

PubMed

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-08-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

PubMed Central

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-01-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Robust Low-dose CT Perfusion Deconvolution via Tensor Total-Variation Regularization

PubMed Central

Zhang, Shaoting; Chen, Tsuhan; Sanelli, Pina C.

2016-01-01

Acute brain diseases such as acute strokes and transit ischemic attacks are the leading causes of mortality and morbidity worldwide, responsible for 9% of total death every year. ‘Time is brain’ is a widely accepted concept in acute cerebrovascular disease treatment. Efficient and accurate computational framework for hemodynamic parameters estimation can save critical time for thrombolytic therapy. Meanwhile the high level of accumulated radiation dosage due to continuous image acquisition in CT perfusion (CTP) raised concerns on patient safety and public health. However, low-radiation leads to increased noise and artifacts which require more sophisticated and time-consuming algorithms for robust estimation. In this paper, we focus on developing a robust and efficient framework to accurately estimate the perfusion parameters at low radiation dosage. Specifically, we present a tensor total-variation (TTV) technique which fuses the spatial correlation of the vascular structure and the temporal continuation of the blood signal flow. An efficient algorithm is proposed to find the solution with fast convergence and reduced computational complexity. Extensive evaluations are carried out in terms of sensitivity to noise levels, estimation accuracy, contrast preservation, and performed on digital perfusion phantom estimation, as well as in-vivo clinical subjects. Our framework reduces the necessary radiation dose to only 8% of the original level and outperforms the state-of-art algorithms with peak signal-to-noise ratio improved by 32%. It reduces the oscillation in the residue functions, corrects over-estimation of cerebral blood flow (CBF) and under-estimation of mean transit time (MTT), and maintains the distinction between the deficit and normal regions. PMID:25706579
Efficient Solution of Three-Dimensional Problems of Acoustic and Electromagnetic Scattering by Open Surfaces

NASA Technical Reports Server (NTRS)

Turc, Catalin; Anand, Akash; Bruno, Oscar; Chaubell, Julian

2011-01-01

We present a computational methodology (a novel Nystrom approach based on use of a non-overlapping patch technique and Chebyshev discretizations) for efficient solution of problems of acoustic and electromagnetic scattering by open surfaces. Our integral equation formulations (1) Incorporate, as ansatz, the singular nature of open-surface integral-equation solutions, and (2) For the Electric Field Integral Equation (EFIE), use analytical regularizes that effectively reduce the number of iterations required by iterative linear-algebra solution based on Krylov-subspace iterative solvers.
An efficient graph theory based method to identify every minimal reaction set in a metabolic network

PubMed Central

2014-01-01

Background Development of cells with minimal metabolic functionality is gaining importance due to their efficiency in producing chemicals and fuels. Existing computational methods to identify minimal reaction sets in metabolic networks are computationally expensive. Further, they identify only one of the several possible minimal reaction sets. Results In this paper, we propose an efficient graph theory based recursive optimization approach to identify all minimal reaction sets. Graph theoretical insights offer systematic methods to not only reduce the number of variables in math programming and increase its computational efficiency, but also provide efficient ways to find multiple optimal solutions. The efficacy of the proposed approach is demonstrated using case studies from Escherichia coli and Saccharomyces cerevisiae. In case study 1, the proposed method identified three minimal reaction sets each containing 38 reactions in Escherichia coli central metabolic network with 77 reactions. Analysis of these three minimal reaction sets revealed that one of them is more suitable for developing minimal metabolism cell compared to other two due to practically achievable internal flux distribution. In case study 2, the proposed method identified 256 minimal reaction sets from the Saccharomyces cerevisiae genome scale metabolic network with 620 reactions. The proposed method required only 4.5 hours to identify all the 256 minimal reaction sets and has shown a significant reduction (approximately 80%) in the solution time when compared to the existing methods for finding minimal reaction set. Conclusions Identification of all minimal reactions sets in metabolic networks is essential since different minimal reaction sets have different properties that effect the bioprocess development. The proposed method correctly identified all minimal reaction sets in a both the case studies. The proposed method is computationally efficient compared to other methods for finding minimal reaction sets and useful to employ with genome-scale metabolic networks. PMID:24594118
Efficient protein structure search using indexing methods

PubMed Central

2013-01-01

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.

PubMed

Kim, Sungchul; Sael, Lee; Yu, Hwanjo

2013-01-01

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
Efficient Proximity Computation Techniques Using ZIP Code Data for Smart Cities †

PubMed Central

Murdani, Muhammad Harist; Hong, Bonghee

2018-01-01

In this paper, we are interested in computing ZIP code proximity from two perspectives, proximity between two ZIP codes (Ad-Hoc) and neighborhood proximity (Top-K). Such a computation can be used for ZIP code-based target marketing as one of the smart city applications. A naïve approach to this computation is the usage of the distance between ZIP codes. We redefine a distance metric combining the centroid distance with the intersecting road network between ZIP codes by using a weighted sum method. Furthermore, we prove that the results of our combined approach conform to the characteristics of distance measurement. We have proposed a general and heuristic approach for computing Ad-Hoc proximity, while for computing Top-K proximity, we have proposed a general approach only. Our experimental results indicate that our approaches are verifiable and effective in reducing the execution time and search space. PMID:29587366
From cosmos to connectomes: the evolution of data-intensive science.

PubMed

Burns, Randal; Vogelstein, Joshua T; Szalay, Alexander S

2014-09-17

The analysis of data requires computation: originally by hand and more recently by computers. Different models of computing are designed and optimized for different kinds of data. In data-intensive science, the scale and complexity of data exceeds the comfort zone of local data stores on scientific workstations. Thus, cloud computing emerges as the preeminent model, utilizing data centers and high-performance clusters, enabling remote users to access and query subsets of the data efficiently. We examine how data-intensive computational systems originally built for cosmology, the Sloan Digital Sky Survey (SDSS), are now being used in connectomics, at the Open Connectome Project. We list lessons learned and outline the top challenges we expect to face. Success in computational connectomics would drastically reduce the time between idea and discovery, as SDSS did in cosmology. Copyright © 2014 Elsevier Inc. All rights reserved.
Efficient Proximity Computation Techniques Using ZIP Code Data for Smart Cities †.

PubMed

Murdani, Muhammad Harist; Kwon, Joonho; Choi, Yoon-Ho; Hong, Bonghee

2018-03-24

In this paper, we are interested in computing ZIP code proximity from two perspectives, proximity between two ZIP codes ( Ad-Hoc ) and neighborhood proximity ( Top-K ). Such a computation can be used for ZIP code-based target marketing as one of the smart city applications. A naïve approach to this computation is the usage of the distance between ZIP codes. We redefine a distance metric combining the centroid distance with the intersecting road network between ZIP codes by using a weighted sum method. Furthermore, we prove that the results of our combined approach conform to the characteristics of distance measurement. We have proposed a general and heuristic approach for computing Ad-Hoc proximity, while for computing Top-K proximity, we have proposed a general approach only. Our experimental results indicate that our approaches are verifiable and effective in reducing the execution time and search space.
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

PubMed Central

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
Structural-Vibration-Response Data Analysis

NASA Technical Reports Server (NTRS)

Smith, W. R.; Hechenlaible, R. N.; Perez, R. C.

1983-01-01

Computer program developed as structural-vibration-response data analysis tool for use in dynamic testing of Space Shuttle. Program provides fast and efficient time-domain least-squares curve-fitting procedure for reducing transient response data to obtain structural model frequencies and dampings from free-decay records. Procedure simultaneously identifies frequencies, damping values, and participation factors for noisy multiple-response records.
An Investigation of the Determinants of Employees' Decisions to Use Organizational Computing Resources for Non-Work Purposes

ERIC Educational Resources Information Center

Campbell, Stephen Matthew

2010-01-01

Internet access in the workplace has become ubiquitous in many organizations. Often, employees need this access to perform their duties. However, many studies report a large percentage of employees use their work Internet access for non-work-related activities. These activities can result in reduced efficiency, increased vulnerability to cyber…
Predicting Aircraft Spray Patterns on Crops

NASA Technical Reports Server (NTRS)

Teske, M. E.; Bilanin, A. J.

1986-01-01

Agricultural Dispersion Prediction (AGDISP) system developed to predict deposition of agricultural material released from rotary- and fixed-wing aircraft. AGDISP computes ensemble average mean motion resulting from turbulent fluid fluctuations. Used to examine ways of making dispersal process more efficient by insuring uniformity, reducing waste, and saving money. Programs in AGDISP system written in FORTRAN IV for interactive execution.
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.

PubMed

De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc

2010-09-01

In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.

PubMed

Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed

2017-01-20

Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
An adaptive multi-level simulation algorithm for stochastic biological systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lester, C., E-mail: lesterc@maths.ox.ac.uk; Giles, M. B.; Baker, R. E.

2015-01-14

Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms (SSA) to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method [Anderson and Higham, “Multi-level Montemore » Carlo for continuous time Markov chains, with applications in biochemical kinetics,” SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)] tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths, the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of τ. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel adaptive time-stepping approach where τ is chosen according to the stochastic behaviour of each sample path, we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.« less
Multiple Concentric Cylinder Model (MCCM) user's guide

NASA Technical Reports Server (NTRS)

Williams, Todd O.; Pindera, Marek-Jerzy

1994-01-01

A user's guide for the computer program mccm.f is presented. The program is based on a recently developed solution methodology for the inelastic response of an arbitrarily layered, concentric cylinder assemblage under thermomechanical loading which is used to model the axisymmetric behavior of unidirectional metal matrix composites in the presence of various microstructural details. These details include the layered morphology of certain types of ceramic fibers, as well as multiple fiber/matrix interfacial layers recently proposed as a means of reducing fabrication-induced, and in-service, residual stress. The computer code allows efficient characterization and evaluation of new fibers and/or new coating systems on existing fibers with a minimum of effort, taking into account inelastic and temperature-dependent properties and different morphologies of the fiber and the interfacial region. It also facilitates efficient design of engineered interfaces for unidirectional metal matrix composites.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

2002-01-01

The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.

Evaluation of Preduster in Cement Industry Based on Computational Fluid Dynamic

NASA Astrophysics Data System (ADS)

Septiani, E. L.; Widiyastuti, W.; Djafaar, A.; Ghozali, I.; Pribadi, H. M.

2017-10-01

Ash-laden hot air from clinker in cement industry is being used to reduce water contain in coal, however it may contain large amount of ash even though it was treated by a preduster. This study investigated preduster performance as a cyclone separator in the cement industry by Computational Fluid Dynamic method. In general, the best performance of cyclone is it have relatively high efficiency with the low pressure drop. The most accurate and simple turbulence model, Reynold Average Navier Stokes (RANS), standard k-ε, and combination with Lagrangian model as particles tracking model were used to solve the problem. The measurement in simulation result are flow pattern in the cyclone, pressure outlet and collection efficiency of preduster. The applied model well predicted by comparing with the most accurate empirical model and pressure outlet in experimental measurement.
Performance advantages of CPML over UPML absorbing boundary conditions in FDTD algorithm

NASA Astrophysics Data System (ADS)

Gvozdic, Branko D.; Djurdjevic, Dusan Z.

2017-01-01

Implementation of absorbing boundary condition (ABC) has a very important role in simulation performance and accuracy in finite difference time domain (FDTD) method. The perfectly matched layer (PML) is the most efficient type of ABC. The aim of this paper is to give detailed insight in and discussion of boundary conditions and hence to simplify the choice of PML used for termination of computational domain in FDTD method. In particular, we demonstrate that using the convolutional PML (CPML) has significant advantages in terms of implementation in FDTD method and reducing computer resources than using uniaxial PML (UPML). An extensive number of numerical experiments has been performed and results have shown that CPML is more efficient in electromagnetic waves absorption. Numerical code is prepared, several problems are analyzed and relative error is calculated and presented.
Computationally efficient multibody simulations

NASA Technical Reports Server (NTRS)

Ramakrishnan, Jayant; Kumar, Manoj

1994-01-01

Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.
Design of a portable artificial heart drive system based on efficiency analysis.

PubMed

Kitamura, T

1986-11-01

This paper discusses a computer simulation of a pneumatic portable piston-type artificial heart drive system with a linear d-c-motor. The purpose of the design is to obtain an artificial heart drive system with high efficiency and small dimensions to enhance portability. The design employs two factors contributing the total efficiency of the drive system. First, the dimensions of the pneumatic actuator were optimized under a cost function of the total efficiency. Second, the motor performance was studied in terms of efficiency. More than 50 percent of the input energy of the actuator with practical loads is consumed in the armature circuit in all linear d-c-motors with brushes. An optimal design is: the piston cross-sectional area of 10.5 cm2 cylinder longitudinal length of 10 cm. The total efficiency could be up to 25 percent by improving the gasket to reduce the frictional force.
Speedup computation of HD-sEMG signals using a motor unit-specific electrical source model.

PubMed

Carriou, Vincent; Boudaoud, Sofiane; Laforet, Jeremy

2018-01-23

Nowadays, bio-reliable modeling of muscle contraction is becoming more accurate and complex. This increasing complexity induces a significant increase in computation time which prevents the possibility of using this model in certain applications and studies. Accordingly, the aim of this work is to significantly reduce the computation time of high-density surface electromyogram (HD-sEMG) generation. This will be done through a new model of motor unit (MU)-specific electrical source based on the fibers composing the MU. In order to assess the efficiency of this approach, we computed the normalized root mean square error (NRMSE) between several simulations on single generated MU action potential (MUAP) using the usual fiber electrical sources and the MU-specific electrical source. This NRMSE was computed for five different simulation sets wherein hundreds of MUAPs are generated and summed into HD-sEMG signals. The obtained results display less than 2% error on the generated signals compared to the same signals generated with fiber electrical sources. Moreover, the computation time of the HD-sEMG signal generation model is reduced to about 90% compared to the fiber electrical source model. Using this model with MU electrical sources, we can simulate HD-sEMG signals of a physiological muscle (hundreds of MU) in less than an hour on a classical workstation. Graphical Abstract Overview of the simulation of HD-sEMG signals using the fiber scale and the MU scale. Upscaling the electrical source to the MU scale reduces the computation time by 90% inducing only small deviation of the same simulated HD-sEMG signals.
paraGSEA: a scalable approach for large-scale gene expression profiling

PubMed Central

Peng, Shaoliang; Yang, Shunyun

2017-01-01

Abstract More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. PMID:28973463
Machining fixture layout optimization using particle swarm optimization algorithm

NASA Astrophysics Data System (ADS)

Dou, Jianping; Wang, Xingsong; Wang, Lei

2011-05-01

Optimization of fixture layout (locator and clamp locations) is critical to reduce geometric error of the workpiece during machining process. In this paper, the application of particle swarm optimization (PSO) algorithm is presented to minimize the workpiece deformation in the machining region. A PSO based approach is developed to optimize fixture layout through integrating ANSYS parametric design language (APDL) of finite element analysis to compute the objective function for a given fixture layout. Particle library approach is used to decrease the total computation time. The computational experiment of 2D case shows that the numbers of function evaluations are decreased about 96%. Case study illustrates the effectiveness and efficiency of the PSO based optimization approach.
Hybrid-dual-fourier tomographic algorithm for a fast three-dimensionial optical image reconstruction in turbid media

NASA Technical Reports Server (NTRS)

Alfano, Robert R. (Inventor); Cai, Wei (Inventor)

2007-01-01

A reconstruction technique for reducing computation burden in the 3D image processes, wherein the reconstruction procedure comprises an inverse and a forward model. The inverse model uses a hybrid dual Fourier algorithm that combines a 2D Fourier inversion with a 1D matrix inversion to thereby provide high-speed inverse computations. The inverse algorithm uses a hybrid transfer to provide fast Fourier inversion for data of multiple sources and multiple detectors. The forward model is based on an analytical cumulant solution of a radiative transfer equation. The accurate analytical form of the solution to the radiative transfer equation provides an efficient formalism for fast computation of the forward model.
A Worst-Case Approach for On-Line Flutter Prediction

NASA Technical Reports Server (NTRS)

Lind, Rick C.; Brenner, Martin J.

1998-01-01

Worst-case flutter margins may be computed for a linear model with respect to a set of uncertainty operators using the structured singular value. This paper considers an on-line implementation to compute these robust margins in a flight test program. Uncertainty descriptions are updated at test points to account for unmodeled time-varying dynamics of the airplane by ensuring the robust model is not invalidated by measured flight data. Robust margins computed with respect to this uncertainty remain conservative to the changing dynamics throughout the flight. A simulation clearly demonstrates this method can improve the efficiency of flight testing by accurately predicting the flutter margin to improve safety while reducing the necessary flight time.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
The use of self-organising maps for anomalous behaviour detection in a digital investigation.

PubMed

Fei, B K L; Eloff, J H P; Olivier, M S; Venter, H S

2006-10-16

The dramatic increase in crime relating to the Internet and computers has caused a growing need for digital forensics. Digital forensic tools have been developed to assist investigators in conducting a proper investigation into digital crimes. In general, the bulk of the digital forensic tools available on the market permit investigators to analyse data that has been gathered from a computer system. However, current state-of-the-art digital forensic tools simply cannot handle large volumes of data in an efficient manner. With the advent of the Internet, many employees have been given access to new and more interesting possibilities via their desktop. Consequently, excessive Internet usage for non-job purposes and even blatant misuse of the Internet have become a problem in many organisations. Since storage media are steadily growing in size, the process of analysing multiple computer systems during a digital investigation can easily consume an enormous amount of time. Identifying a single suspicious computer from a set of candidates can therefore reduce human processing time and monetary costs involved in gathering evidence. The focus of this paper is to demonstrate how, in a digital investigation, digital forensic tools and the self-organising map (SOM)--an unsupervised neural network model--can aid investigators to determine anomalous behaviours (or activities) among employees (or computer systems) in a far more efficient manner. By analysing the different SOMs (one for each computer system), anomalous behaviours are identified and investigators are assisted to conduct the analysis more efficiently. The paper will demonstrate how the easy visualisation of the SOM enhances the ability of the investigators to interpret and explore the data generated by digital forensic tools so as to determine anomalous behaviours.
Improvement of correlation-based centroiding methods for point source Shack-Hartmann wavefront sensor

NASA Astrophysics Data System (ADS)

Li, Xuxu; Li, Xinyang; wang, Caixia

2018-03-01

This paper proposes an efficient approach to decrease the computational costs of correlation-based centroiding methods used for point source Shack-Hartmann wavefront sensors. Four typical similarity functions have been compared, i.e. the absolute difference function (ADF), ADF square (ADF2), square difference function (SDF), and cross-correlation function (CCF) using the Gaussian spot model. By combining them with fast search algorithms, such as three-step search (TSS), two-dimensional logarithmic search (TDL), cross search (CS), and orthogonal search (OS), computational costs can be reduced drastically without affecting the accuracy of centroid detection. Specifically, OS reduces calculation consumption by 90%. A comprehensive simulation indicates that CCF exhibits a better performance than other functions under various light-level conditions. Besides, the effectiveness of fast search algorithms has been verified.
Extreme learning machine for reduced order modeling of turbulent geophysical flows.

PubMed

San, Omer; Maulik, Romit

2018-04-01

We investigate the application of artificial neural networks to stabilize proper orthogonal decomposition-based reduced order models for quasistationary geophysical turbulent flows. An extreme learning machine concept is introduced for computing an eddy-viscosity closure dynamically to incorporate the effects of the truncated modes. We consider a four-gyre wind-driven ocean circulation problem as our prototype setting to assess the performance of the proposed data-driven approach. Our framework provides a significant reduction in computational time and effectively retains the dynamics of the full-order model during the forward simulation period beyond the training data set. Furthermore, we show that the method is robust for larger choices of time steps and can be used as an efficient and reliable tool for long time integration of general circulation models.
Extreme learning machine for reduced order modeling of turbulent geophysical flows

NASA Astrophysics Data System (ADS)

San, Omer; Maulik, Romit

2018-04-01

We investigate the application of artificial neural networks to stabilize proper orthogonal decomposition-based reduced order models for quasistationary geophysical turbulent flows. An extreme learning machine concept is introduced for computing an eddy-viscosity closure dynamically to incorporate the effects of the truncated modes. We consider a four-gyre wind-driven ocean circulation problem as our prototype setting to assess the performance of the proposed data-driven approach. Our framework provides a significant reduction in computational time and effectively retains the dynamics of the full-order model during the forward simulation period beyond the training data set. Furthermore, we show that the method is robust for larger choices of time steps and can be used as an efficient and reliable tool for long time integration of general circulation models.
A service-based BLAST command tool supported by cloud infrastructures.

PubMed

Carrión, Abel; Blanquer, Ignacio; Hernández, Vicente

2012-01-01

Notwithstanding the benefits of distributed-computing infrastructures for empowering bioinformatics analysis tools with the needed computing and storage capability, the actual use of these infrastructures is still low. Learning curves and deployment difficulties have reduced the impact on the wide research community. This article presents a porting strategy of BLAST based on a multiplatform client and a service that provides the same interface as sequential BLAST, thus reducing learning curve and with minimal impact on their integration on existing workflows. The porting has been done using the execution and data access components from the EC project Venus-C and the Windows Azure infrastructure provided in this project. The results obtained demonstrate a low overhead on the global execution framework and reasonable speed-up and cost-efficiency with respect to a sequential version.
Muons in the CMS High Level Trigger System

NASA Astrophysics Data System (ADS)

Verwilligen, Piet; CMS Collaboration

2016-04-01

The trigger systems of LHC detectors play a fundamental role in defining the physics capabilities of the experiments. A reduction of several orders of magnitude in the rate of collected events, with respect to the proton-proton bunch crossing rate generated by the LHC, is mandatory to cope with the limits imposed by the readout and storage system. An accurate and efficient online selection mechanism is thus required to fulfill the task keeping maximal the acceptance to physics signals. The CMS experiment operates using a two-level trigger system. Firstly a Level-1 Trigger (L1T) system, implemented using custom-designed electronics, is designed to reduce the event rate to a limit compatible to the CMS Data Acquisition (DAQ) capabilities. A High Level Trigger System (HLT) follows, aimed at further reducing the rate of collected events finally stored for analysis purposes. The latter consists of a streamlined version of the CMS offline reconstruction software and operates on a computer farm. It runs algorithms optimized to make a trade-off between computational complexity, rate reduction and high selection efficiency. With the computing power available in 2012 the maximum reconstruction time at HLT was about 200 ms per event, at the nominal L1T rate of 100 kHz. An efficient selection of muons at HLT, as well as an accurate measurement of their properties, such as transverse momentum and isolation, is fundamental for the CMS physics programme. The performance of the muon HLT for single and double muon triggers achieved in Run I will be presented. Results from new developments, aimed at improving the performance of the algorithms for the harsher scenarios of collisions per event (pile-up) and luminosity expected for Run II will also be discussed.
Reliability based design optimization: Formulations and methodologies

NASA Astrophysics Data System (ADS)

Agarwal, Harish

Modern products ranging from simple components to complex systems should be designed to be optimal and reliable. The challenge of modern engineering is to ensure that manufacturing costs are reduced and design cycle times are minimized while achieving requirements for performance and reliability. If the market for the product is competitive, improved quality and reliability can generate very strong competitive advantages. Simulation based design plays an important role in designing almost any kind of automotive, aerospace, and consumer products under these competitive conditions. Single discipline simulations used for analysis are being coupled together to create complex coupled simulation tools. This investigation focuses on the development of efficient and robust methodologies for reliability based design optimization in a simulation based design environment. Original contributions of this research are the development of a novel efficient and robust unilevel methodology for reliability based design optimization, the development of an innovative decoupled reliability based design optimization methodology, the application of homotopy techniques in unilevel reliability based design optimization methodology, and the development of a new framework for reliability based design optimization under epistemic uncertainty. The unilevel methodology for reliability based design optimization is shown to be mathematically equivalent to the traditional nested formulation. Numerical test problems show that the unilevel methodology can reduce computational cost by at least 50% as compared to the nested approach. The decoupled reliability based design optimization methodology is an approximate technique to obtain consistent reliable designs at lesser computational expense. Test problems show that the methodology is computationally efficient compared to the nested approach. A framework for performing reliability based design optimization under epistemic uncertainty is also developed. A trust region managed sequential approximate optimization methodology is employed for this purpose. Results from numerical test studies indicate that the methodology can be used for performing design optimization under severe uncertainty.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

NASA Astrophysics Data System (ADS)

Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.

2017-01-01

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).
Efficiency analysis of numerical integrations for finite element substructure in real-time hybrid simulation

NASA Astrophysics Data System (ADS)

Wang, Jinting; Lu, Liqiao; Zhu, Fei

2018-01-01

Finite element (FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations (RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time (TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method (CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ (λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.
P-HS-SFM: a parallel harmony search algorithm for the reproduction of experimental data in the continuous microscopic crowd dynamic models

NASA Astrophysics Data System (ADS)

Jaber, Khalid Mohammad; Alia, Osama Moh'd.; Shuaib, Mohammed Mahmod

2018-03-01

Finding the optimal parameters that can reproduce experimental data (such as the velocity-density relation and the specific flow rate) is a very important component of the validation and calibration of microscopic crowd dynamic models. Heavy computational demand during parameter search is a known limitation that exists in a previously developed model known as the Harmony Search-Based Social Force Model (HS-SFM). In this paper, a parallel-based mechanism is proposed to reduce the computational time and memory resource utilisation required to find these parameters. More specifically, two MATLAB-based multicore techniques (parfor and create independent jobs) using shared memory are developed by taking advantage of the multithreading capabilities of parallel computing, resulting in a new framework called the Parallel Harmony Search-Based Social Force Model (P-HS-SFM). The experimental results show that the parfor-based P-HS-SFM achieved a better computational time of about 26 h, an efficiency improvement of ? 54% and a speedup factor of 2.196 times in comparison with the HS-SFM sequential processor. The performance of the P-HS-SFM using the create independent jobs approach is also comparable to parfor with a computational time of 26.8 h, an efficiency improvement of about 30% and a speedup of 2.137 times.

Simulating coupled dynamics of a rigid-flexible multibody system and compressible fluid

NASA Astrophysics Data System (ADS)

Hu, Wei; Tian, Qiang; Hu, HaiYan

2018-04-01

As a subsequent work of previous studies of authors, a new parallel computation approach is proposed to simulate the coupled dynamics of a rigid-flexible multibody system and compressible fluid. In this approach, the smoothed particle hydrodynamics (SPH) method is used to model the compressible fluid, the natural coordinate formulation (NCF) and absolute nodal coordinate formulation (ANCF) are used to model the rigid and flexible bodies, respectively. In order to model the compressible fluid properly and efficiently via SPH method, three measures are taken as follows. The first is to use the Riemann solver to cope with the fluid compressibility, the second is to define virtual particles of SPH to model the dynamic interaction between the fluid and the multibody system, and the third is to impose the boundary conditions of periodical inflow and outflow to reduce the number of SPH particles involved in the computation process. Afterwards, a parallel computation strategy is proposed based on the graphics processing unit (GPU) to detect the neighboring SPH particles and to solve the dynamic equations of SPH particles in order to improve the computation efficiency. Meanwhile, the generalized-alpha algorithm is used to solve the dynamic equations of the multibody system. Finally, four case studies are given to validate the proposed parallel computation approach.
Optimization of Angular-Momentum Biases of Reaction Wheels

NASA Technical Reports Server (NTRS)

Lee, Clifford; Lee, Allan

2008-01-01

RBOT [RWA Bias Optimization Tool (wherein RWA signifies Reaction Wheel Assembly )] is a computer program designed for computing angular momentum biases for reaction wheels used for providing spacecraft pointing in various directions as required for scientific observations. RBOT is currently deployed to support the Cassini mission to prevent operation of reaction wheels at unsafely high speeds while minimizing time in undesirable low-speed range, where elasto-hydrodynamic lubrication films in bearings become ineffective, leading to premature bearing failure. The problem is formulated as a constrained optimization problem in which maximum wheel speed limit is a hard constraint and a cost functional that increases as speed decreases below a low-speed threshold. The optimization problem is solved using a parametric search routine known as the Nelder-Mead simplex algorithm. To increase computational efficiency for extended operation involving large quantity of data, the algorithm is designed to (1) use large time increments during intervals when spacecraft attitudes or rates of rotation are nearly stationary, (2) use sinusoidal-approximation sampling to model repeated long periods of Earth-point rolling maneuvers to reduce computational loads, and (3) utilize an efficient equation to obtain wheel-rate profiles as functions of initial wheel biases based on conservation of angular momentum (in an inertial frame) using pre-computed terms.
Dynamic displacement measurement of large-scale structures based on the Lucas-Kanade template tracking algorithm

NASA Astrophysics Data System (ADS)

Guo, Jie; Zhu, Chang`an

2016-01-01

The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks.

PubMed

Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong

2015-08-07

Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Accurate and efficient calculation of excitation energies with the active-space particle-particle random phase approximation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Du; Yang, Weitao

An efficient method for calculating excitation energies based on the particle-particle random phase approximation (ppRPA) is presented. Neglecting the contributions from the high-lying virtual states and the low-lying core states leads to the significantly smaller active-space ppRPA matrix while keeping the error to within 0.05 eV from the corresponding full ppRPA excitation energies. The resulting computational cost is significantly reduced and becomes less than the construction of the non-local Fock exchange potential matrix in the self-consistent-field (SCF) procedure. With only a modest number of active orbitals, the original ppRPA singlet-triplet (ST) gaps as well as the low-lying single and doublemore » excitation energies can be accurately reproduced at much reduced computational costs, up to 100 times faster than the iterative Davidson diagonalization of the original full ppRPA matrix. For high-lying Rydberg excitations where the Davidson algorithm fails, the computational savings of active-space ppRPA with respect to the direct diagonalization is even more dramatic. The virtues of the underlying full ppRPA combined with the significantly lower computational cost of the active-space approach will significantly expand the applicability of the ppRPA method to calculate excitation energies at a cost of O(K^{4}), with a prefactor much smaller than a single SCF Hartree-Fock (HF)/hybrid functional calculation, thus opening up new possibilities for the quantum mechanical study of excited state electronic structure of large systems.« less
Accurate and efficient calculation of excitation energies with the active-space particle-particle random phase approximation

DOE PAGES

Zhang, Du; Yang, Weitao

2016-10-13

An efficient method for calculating excitation energies based on the particle-particle random phase approximation (ppRPA) is presented. Neglecting the contributions from the high-lying virtual states and the low-lying core states leads to the significantly smaller active-space ppRPA matrix while keeping the error to within 0.05 eV from the corresponding full ppRPA excitation energies. The resulting computational cost is significantly reduced and becomes less than the construction of the non-local Fock exchange potential matrix in the self-consistent-field (SCF) procedure. With only a modest number of active orbitals, the original ppRPA singlet-triplet (ST) gaps as well as the low-lying single and doublemore » excitation energies can be accurately reproduced at much reduced computational costs, up to 100 times faster than the iterative Davidson diagonalization of the original full ppRPA matrix. For high-lying Rydberg excitations where the Davidson algorithm fails, the computational savings of active-space ppRPA with respect to the direct diagonalization is even more dramatic. The virtues of the underlying full ppRPA combined with the significantly lower computational cost of the active-space approach will significantly expand the applicability of the ppRPA method to calculate excitation energies at a cost of O(K^{4}), with a prefactor much smaller than a single SCF Hartree-Fock (HF)/hybrid functional calculation, thus opening up new possibilities for the quantum mechanical study of excited state electronic structure of large systems.« less
An intersecting chord method for minimum circumscribed sphere and maximum inscribed sphere evaluations of sphericity error

NASA Astrophysics Data System (ADS)

Liu, Fei; Xu, Guanghua; Zhang, Qing; Liang, Lin; Liu, Dan

2015-11-01

As one of the Geometrical Product Specifications that are widely applied in industrial manufacturing and measurement, sphericity error can synthetically scale a 3D structure and reflects the machining quality of a spherical workpiece. Following increasing demands in the high motion performance of spherical parts, sphericity error is becoming an indispensable component in the evaluation of form error. However, the evaluation of sphericity error is still considered to be a complex mathematical issue, and the related research studies on the development of available models are lacking. In this paper, an intersecting chord method is first proposed to solve the minimum circumscribed sphere and maximum inscribed sphere evaluations of sphericity error. This new modelling method leverages chord relationships to replace the characteristic points, thereby significantly reducing the computational complexity and improving the computational efficiency. Using the intersecting chords to generate a virtual centre, the reference sphere in two concentric spheres is simplified as a space intersecting structure. The position of the virtual centre on the space intersecting structure is determined by characteristic chords, which may reduce the deviation between the virtual centre and the centre of the reference sphere. In addition,two experiments are used to verify the effectiveness of the proposed method with real datasets from the Cartesian coordinates. The results indicate that the estimated errors are in perfect agreement with those of the published methods. Meanwhile, the computational efficiency is improved. For the evaluation of the sphericity error, the use of high performance computing is a remarkable change.
A novel semi-transductive learning framework for efficient atypicality detection in chest radiographs

NASA Astrophysics Data System (ADS)

Alzubaidi, Mohammad; Balasubramanian, Vineeth; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.

2012-03-01

Inductive learning refers to machine learning algorithms that learn a model from a set of training data instances. Any test instance is then classified by comparing it to the learned model. When the set of training instances lend themselves well to modeling, the use of a model substantially reduces the computation cost of classification. However, some training data sets are complex, and do not lend themselves well to modeling. Transductive learning refers to machine learning algorithms that classify test instances by comparing them to all of the training instances, without creating an explicit model. This can produce better classification performance, but at a much higher computational cost. Medical images vary greatly across human populations, constituting a data set that does not lend itself well to modeling. Our previous work showed that the wide variations seen across training sets of "normal" chest radiographs make it difficult to successfully classify test radiographs with an inductive (modeling) approach, and that a transductive approach leads to much better performance in detecting atypical regions. The problem with the transductive approach is its high computational cost. This paper develops and demonstrates a novel semi-transductive framework that can address the unique challenges of atypicality detection in chest radiographs. The proposed framework combines the superior performance of transductive methods with the reduced computational cost of inductive methods. Our results show that the proposed semitransductive approach provides both effective and efficient detection of atypical regions within a set of chest radiographs previously labeled by Mayo Clinic expert thoracic radiologists.
Efficient Data-Worth Analysis Using a Multilevel Monte Carlo Method Applied in Oil Reservoir Simulations

NASA Astrophysics Data System (ADS)

Lu, D.; Ricciuto, D. M.; Evans, K. J.

2017-12-01

Data-worth analysis plays an essential role in improving the understanding of the subsurface system, in developing and refining subsurface models, and in supporting rational water resources management. However, data-worth analysis is computationally expensive as it requires quantifying parameter uncertainty, prediction uncertainty, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface simulations using standard Monte Carlo (MC) sampling or advanced surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose efficient Bayesian analysis of data-worth using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce the computational cost with the use of multifidelity approximations. As the data-worth analysis involves a great deal of expectation estimations, the cost savings from MLMC in the assessment can be very outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it to a highly heterogeneous oil reservoir simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the estimation obtained from the standard MC. But compared to the standard MC, the MLMC greatly reduces the computational costs in the uncertainty reduction estimation, with up to 600 days cost savings when one processor is used.
Total variation-based neutron computed tomography

NASA Astrophysics Data System (ADS)

Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick

2018-05-01

We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.
Accelerated solution of discrete ordinates approximation to the Boltzmann transport equation via model reduction

DOE PAGES

Tencer, John; Carlberg, Kevin; Larsen, Marvin; ...

2017-06-17

Radiation heat transfer is an important phenomenon in many physical systems of practical interest. When participating media is important, the radiative transfer equation (RTE) must be solved for the radiative intensity as a function of location, time, direction, and wavelength. In many heat-transfer applications, a quasi-steady assumption is valid, thereby removing time dependence. The dependence on wavelength is often treated through a weighted sum of gray gases (WSGG) approach. The discrete ordinates method (DOM) is one of the most common methods for approximating the angular (i.e., directional) dependence. The DOM exactly solves for the radiative intensity for a finite numbermore » of discrete ordinate directions and computes approximations to integrals over the angular space using a quadrature rule; the chosen ordinate directions correspond to the nodes of this quadrature rule. This paper applies a projection-based model-reduction approach to make high-order quadrature computationally feasible for the DOM for purely absorbing applications. First, the proposed approach constructs a reduced basis from (high-fidelity) solutions of the radiative intensity computed at a relatively small number of ordinate directions. Then, the method computes inexpensive approximations of the radiative intensity at the (remaining) quadrature points of a high-order quadrature using a reduced-order model constructed from the reduced basis. Finally, this results in a much more accurate solution than might have been achieved using only the ordinate directions used to compute the reduced basis. One- and three-dimensional test problems highlight the efficiency of the proposed method.« less
Accelerated solution of discrete ordinates approximation to the Boltzmann transport equation via model reduction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tencer, John; Carlberg, Kevin; Larsen, Marvin

Radiation heat transfer is an important phenomenon in many physical systems of practical interest. When participating media is important, the radiative transfer equation (RTE) must be solved for the radiative intensity as a function of location, time, direction, and wavelength. In many heat-transfer applications, a quasi-steady assumption is valid, thereby removing time dependence. The dependence on wavelength is often treated through a weighted sum of gray gases (WSGG) approach. The discrete ordinates method (DOM) is one of the most common methods for approximating the angular (i.e., directional) dependence. The DOM exactly solves for the radiative intensity for a finite numbermore » of discrete ordinate directions and computes approximations to integrals over the angular space using a quadrature rule; the chosen ordinate directions correspond to the nodes of this quadrature rule. This paper applies a projection-based model-reduction approach to make high-order quadrature computationally feasible for the DOM for purely absorbing applications. First, the proposed approach constructs a reduced basis from (high-fidelity) solutions of the radiative intensity computed at a relatively small number of ordinate directions. Then, the method computes inexpensive approximations of the radiative intensity at the (remaining) quadrature points of a high-order quadrature using a reduced-order model constructed from the reduced basis. Finally, this results in a much more accurate solution than might have been achieved using only the ordinate directions used to compute the reduced basis. One- and three-dimensional test problems highlight the efficiency of the proposed method.« less
Efficient model reduction of parametrized systems by matrix discrete empirical interpolation

NASA Astrophysics Data System (ADS)

Negri, Federico; Manzoni, Andrea; Amsallem, David

2015-12-01

In this work, we apply a Matrix version of the so-called Discrete Empirical Interpolation (MDEIM) for the efficient reduction of nonaffine parametrized systems arising from the discretization of linear partial differential equations. Dealing with affinely parametrized operators is crucial in order to enhance the online solution of reduced-order models (ROMs). However, in many cases such an affine decomposition is not readily available, and must be recovered through (often) intrusive procedures, such as the empirical interpolation method (EIM) and its discrete variant DEIM. In this paper we show that MDEIM represents a very efficient approach to deal with complex physical and geometrical parametrizations in a non-intrusive, efficient and purely algebraic way. We propose different strategies to combine MDEIM with a state approximation resulting either from a reduced basis greedy approach or Proper Orthogonal Decomposition. A posteriori error estimates accounting for the MDEIM error are also developed in the case of parametrized elliptic and parabolic equations. Finally, the capability of MDEIM to generate accurate and efficient ROMs is demonstrated on the solution of two computationally-intensive classes of problems occurring in engineering contexts, namely PDE-constrained shape optimization and parametrized coupled problems.
Reduced hemispheric asymmetry of brain anatomical networks in attention deficit hyperactivity disorder.

PubMed

Li, Dandan; Li, Ting; Niu, Yan; Xiang, Jie; Cao, Rui; Liu, Bo; Zhang, Hui; Wang, Bin

2018-05-11

Despite many studies reporting a variety of alterations in brain networks in patients with attention deficit hyperactivity disorder (ADHD), alterations in hemispheric anatomical networks are still unclear. In this study, we investigated topology alterations in hemispheric white matter in patients with ADHD and the relationship between these alterations and clinical features of the illness. Weighted hemispheric brain anatomical networks were first constructed for each of 40 right-handed patients with ADHD and 53 matched normal controls. Then, graph theoretical approaches were utilized to compute hemispheric topological properties. The small-world property was preserved in the hemispheric network. Furthermore, a significant group-by-hemisphere interaction was revealed in global efficiency, local efficiency and characteristic path length, attributed to the significantly reduced hemispheric asymmetry of global and local integration in patients with ADHD compared with normal controls. Specifically, reduced asymmetric regional efficiency was found in three regions. Finally, we found that the abnormal asymmetry of hemispheric brain anatomical network topology and regional efficiency were both associated with clinical features (the Adult ADHD Self-Report Scale and Wechsler Adult Intelligence Scale) in patients. Our findings provide new insights into the lateralized nature of hemispheric dysconnectivity and highlight the potential for using brain network measures of hemispheric asymmetry as neural biomarkers for ADHD and its clinical features.
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasifiction combined sycle (IGCC) power plant with CO2 capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasification combined cycle (IGCC) power plant with CO{sub 2} capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hameed, Abdul; Khoshkbarforoushha, Alireza; Ranjan, Rajiv

In a cloud computing paradigm, energy efficient allocation of different virtualized ICT resources (servers, storage disks, and networks, and the like) is a complex problem due to the presence of heterogeneous application (e.g., content delivery networks, MapReduce, web applications, and the like) workloads having contentious allocation requirements in terms of ICT resource capacities (e.g., network bandwidth, processing speed, response time, etc.). Several recent papers have tried to address the issue of improving energy efficiency in allocating cloud resources to applications with varying degree of success. However, to the best of our knowledge there is no published literature on this subjectmore » that clearly articulates the research problem and provides research taxonomy for succinct classification of existing techniques. Hence, the main aim of this paper is to identify open challenges associated with energy efficient resource allocation. In this regard, the study, first, outlines the problem and existing hardware and software-based techniques available for this purpose. Furthermore, available techniques already presented in the literature are summarized based on the energy-efficient research dimension taxonomy. The advantages and disadvantages of the existing techniques are comprehensively analyzed against the proposed research dimension taxonomy namely: resource adaption policy, objective function, allocation method, allocation operation, and interoperability.« less
Extending Moore's Law via Computationally Error Tolerant Computing.

DOE PAGES

Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.; ...

2018-03-01

Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less
Ship Trim Optimization: Assessment of Influence of Trim on Resistance of MOERI Container Ship

PubMed Central

Duan, Wenyang

2014-01-01

Environmental issues and rising fuel prices necessitate better energy efficiency in all sectors. Shipping industry is a stakeholder in environmental issues. Shipping industry is responsible for approximately 3% of global CO2 emissions, 14-15% of global NOX emissions, and 16% of global SOX emissions. Ship trim optimization has gained enormous momentum in recent years being an effective operational measure for better energy efficiency to reduce emissions. Ship trim optimization analysis has traditionally been done through tow-tank testing for a specific hullform. Computational techniques are increasingly popular in ship hydrodynamics applications. The purpose of this study is to present MOERI container ship (KCS) hull trim optimization by employing computational methods. KCS hull total resistances and trim and sinkage computed values, in even keel condition, are compared with experimental values and found in reasonable agreement. The agreement validates that mesh, boundary conditions, and solution techniques are correct. The same mesh, boundary conditions, and solution techniques are used to obtain resistance values in different trim conditions at Fn = 0.2274. Based on attained results, optimum trim is suggested. This research serves as foundation for employing computational techniques for ship trim optimization. PMID:24578649
Extending Moore's Law via Computationally Error Tolerant Computing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.

Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less

Towards a Low-Cost Remote Memory Attestation for the Smart Grid

PubMed Central

Yang, Xinyu; He, Xiaofei; Yu, Wei; Lin, Jie; Li, Rui; Yang, Qingyu; Song, Houbing

2015-01-01

In the smart grid, measurement devices may be compromised by adversaries, and their operations could be disrupted by attacks. A number of schemes to efficiently and accurately detect these compromised devices remotely have been proposed. Nonetheless, most of the existing schemes detecting compromised devices depend on the incremental response time in the attestation process, which are sensitive to data transmission delay and lead to high computation and network overhead. To address the issue, in this paper, we propose a low-cost remote memory attestation scheme (LRMA), which can efficiently and accurately detect compromised smart meters considering real-time network delay and achieve low computation and network overhead. In LRMA, the impact of real-time network delay on detecting compromised nodes can be eliminated via investigating the time differences reported from relay nodes. Furthermore, the attestation frequency in LRMA is dynamically adjusted with the compromised probability of each node, and then, the total number of attestations could be reduced while low computation and network overhead can be achieved. Through a combination of extensive theoretical analysis and evaluations, our data demonstrate that our proposed scheme can achieve better detection capacity and lower computation and network overhead in comparison to existing schemes. PMID:26307998
Towards a Low-Cost Remote Memory Attestation for the Smart Grid.

PubMed

Yang, Xinyu; He, Xiaofei; Yu, Wei; Lin, Jie; Li, Rui; Yang, Qingyu; Song, Houbing

2015-08-21

In the smart grid, measurement devices may be compromised by adversaries, and their operations could be disrupted by attacks. A number of schemes to efficiently and accurately detect these compromised devices remotely have been proposed. Nonetheless, most of the existing schemes detecting compromised devices depend on the incremental response time in the attestation process, which are sensitive to data transmission delay and lead to high computation and network overhead. To address the issue, in this paper, we propose a low-cost remote memory attestation scheme (LRMA), which can efficiently and accurately detect compromised smart meters considering real-time network delay and achieve low computation and network overhead. In LRMA, the impact of real-time network delay on detecting compromised nodes can be eliminated via investigating the time differences reported from relay nodes. Furthermore, the attestation frequency in LRMA is dynamically adjusted with the compromised probability of each node, and then, the total number of attestations could be reduced while low computation and network overhead can be achieved. Through a combination of extensive theoretical analysis and evaluations, our data demonstrate that our proposed scheme can achieve better detection capacity and lower computation and network overhead in comparison to existing schemes.
Aerodynamic design optimization using sensitivity analysis and computational fluid dynamics

NASA Technical Reports Server (NTRS)

Baysal, Oktay; Eleshaky, Mohamed E.

1991-01-01

A new and efficient method is presented for aerodynamic design optimization, which is based on a computational fluid dynamics (CFD)-sensitivity analysis algorithm. The method is applied to design a scramjet-afterbody configuration for an optimized axial thrust. The Euler equations are solved for the inviscid analysis of the flow, which in turn provides the objective function and the constraints. The CFD analysis is then coupled with the optimization procedure that uses a constrained minimization method. The sensitivity coefficients, i.e. gradients of the objective function and the constraints, needed for the optimization are obtained using a quasi-analytical method rather than the traditional brute force method of finite difference approximations. During the one-dimensional search of the optimization procedure, an approximate flow analysis (predicted flow) based on a first-order Taylor series expansion is used to reduce the computational cost. Finally, the sensitivity of the optimum objective function to various design parameters, which are kept constant during the optimization, is computed to predict new optimum solutions. The flow analysis of the demonstrative example are compared with the experimental data. It is shown that the method is more efficient than the traditional methods.
Gaussian Radial Basis Function for Efficient Computation of Forest Indirect Illumination

NASA Astrophysics Data System (ADS)

Abbas, Fayçal; Babahenini, Mohamed Chaouki

2018-06-01

Global illumination of natural scenes in real time like forests is one of the most complex problems to solve, because the multiple inter-reflections between the light and material of the objects composing the scene. The major problem that arises is the problem of visibility computation. In fact, the computing of visibility is carried out for all the set of leaves visible from the center of a given leaf, given the enormous number of leaves present in a tree, this computation performed for each leaf of the tree which also reduces performance. We describe a new approach that approximates visibility queries, which precede in two steps. The first step is to generate point cloud representing the foliage. We assume that the point cloud is composed of two classes (visible, not-visible) non-linearly separable. The second step is to perform a point cloud classification by applying the Gaussian radial basis function, which measures the similarity in term of distance between each leaf and a landmark leaf. It allows approximating the visibility requests to extract the leaves that will be used to calculate the amount of indirect illumination exchanged between neighbor leaves. Our approach allows efficiently treat the light exchanges in the scene of a forest, it allows a fast computation and produces images of good visual quality, all this takes advantage of the immense power of computation of the GPU.
Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

PubMed Central

Teodoro, George; Pan, Tony; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

2013-01-01

We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively. PMID:23908562
A parallel finite element procedure for contact-impact problems using edge-based smooth triangular element and GPU

NASA Astrophysics Data System (ADS)

Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang

2018-04-01

The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Efficient Geometric Sound Propagation Using Visibility Culling

NASA Astrophysics Data System (ADS)

Chandak, Anish

2011-07-01

Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
Cloud-based large-scale air traffic flow optimization

NASA Astrophysics Data System (ADS)

Cao, Yi

The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
Secure Data Access Control for Fog Computing Based on Multi-Authority Attribute-Based Signcryption with Computation Outsourcing and Attribute Revocation.

PubMed

Xu, Qian; Tan, Chengxiang; Fan, Zhijie; Zhu, Wenye; Xiao, Ya; Cheng, Fujia

2018-05-17

Nowadays, fog computing provides computation, storage, and application services to end users in the Internet of Things. One of the major concerns in fog computing systems is how fine-grained access control can be imposed. As a logical combination of attribute-based encryption and attribute-based signature, Attribute-based Signcryption (ABSC) can provide confidentiality and anonymous authentication for sensitive data and is more efficient than traditional "encrypt-then-sign" or "sign-then-encrypt" strategy. Thus, ABSC is suitable for fine-grained access control in a semi-trusted cloud environment and is gaining more and more attention recently. However, in many existing ABSC systems, the computation cost required for the end users in signcryption and designcryption is linear with the complexity of signing and encryption access policy. Moreover, only a single authority that is responsible for attribute management and key generation exists in the previous proposed ABSC schemes, whereas in reality, mostly, different authorities monitor different attributes of the user. In this paper, we propose OMDAC-ABSC, a novel data access control scheme based on Ciphertext-Policy ABSC, to provide data confidentiality, fine-grained control, and anonymous authentication in a multi-authority fog computing system. The signcryption and designcryption overhead for the user is significantly reduced by outsourcing the undesirable computation operations to fog nodes. The proposed scheme is proven to be secure in the standard model and can provide attribute revocation and public verifiability. The security analysis, asymptotic complexity comparison, and implementation results indicate that our construction can balance the security goals with practical efficiency in computation.
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

PubMed

Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher

2014-01-01

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
A case study of tuning MapReduce for efficient Bioinformatics in the cloud

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shi, Lizhen; Wang, Zhong; Yu, Weikuan

The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less
Speech recognition for embedded automatic positioner for laparoscope

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

2014-07-01

In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.
An integral-factorized implementation of the driven similarity renormalization group second-order multireference perturbation theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hannon, Kevin P.; Li, Chenyang; Evangelista, Francesco A., E-mail: francesco.evangelista@emory.edu

2016-05-28

We report an efficient implementation of a second-order multireference perturbation theory based on the driven similarity renormalization group (DSRG-MRPT2) [C. Li and F. A. Evangelista, J. Chem. Theory Comput. 11, 2097 (2015)]. Our implementation employs factorized two-electron integrals to avoid storage of large four-index intermediates. It also exploits the block structure of the reference density matrices to reduce the computational cost to that of second-order Møller–Plesset perturbation theory. Our new DSRG-MRPT2 implementation is benchmarked on ten naphthyne isomers using basis sets up to quintuple-ζ quality. We find that the singlet-triplet splittings (Δ{sub ST}) of the naphthyne isomers strongly depend onmore » the equilibrium structures. For a consistent set of geometries, the Δ{sub ST} values predicted by the DSRG-MRPT2 are in good agreements with those computed by the reduced multireference coupled cluster theory with singles, doubles, and perturbative triples.« less
Design of k-Space Channel Combination Kernels and Integration with Parallel Imaging

PubMed Central

Beatty, Philip J.; Chang, Shaorong; Holmes, James H.; Wang, Kang; Brau, Anja C. S.; Reeder, Scott B.; Brittain, Jean H.

2014-01-01

Purpose In this work, a new method is described for producing local k-space channel combination kernels using a small amount of low-resolution multichannel calibration data. Additionally, this work describes how these channel combination kernels can be combined with local k-space unaliasing kernels produced by the calibration phase of parallel imaging methods such as GRAPPA, PARS and ARC. Methods Experiments were conducted to evaluate both the image quality and computational efficiency of the proposed method compared to a channel-by-channel parallel imaging approach with image-space sum-of-squares channel combination. Results Results indicate comparable image quality overall, with some very minor differences seen in reduced field-of-view imaging. It was demonstrated that this method enables a speed up in computation time on the order of 3–16X for 32-channel data sets. Conclusion The proposed method enables high quality channel combination to occur earlier in the reconstruction pipeline, reducing computational and memory requirements for image reconstruction. PMID:23943602
Scatter correction for cone-beam computed tomography using self-adaptive scatter kernel superposition

NASA Astrophysics Data System (ADS)

Xie, Shi-Peng; Luo, Li-Min

2012-06-01

The authors propose a combined scatter reduction and correction method to improve image quality in cone beam computed tomography (CBCT). The scatter kernel superposition (SKS) method has been used occasionally in previous studies. However, this method differs in that a scatter detecting blocker (SDB) was used between the X-ray source and the tested object to model the self-adaptive scatter kernel. This study first evaluates the scatter kernel parameters using the SDB, and then isolates the scatter distribution based on the SKS. The quality of image can be improved by removing the scatter distribution. The results show that the method can effectively reduce the scatter artifacts, and increase the image quality. Our approach increases the image contrast and reduces the magnitude of cupping. The accuracy of the SKS technique can be significantly improved in our method by using a self-adaptive scatter kernel. This method is computationally efficient, easy to implement, and provides scatter correction using a single scan acquisition.
Combining neural networks and signed particles to simulate quantum systems more efficiently

NASA Astrophysics Data System (ADS)

Sellier, Jean Michel

2018-04-01

Recently a new formulation of quantum mechanics has been suggested which describes systems by means of ensembles of classical particles provided with a sign. This novel approach mainly consists of two steps: the computation of the Wigner kernel, a multi-dimensional function describing the effects of the potential over the system, and the field-less evolution of the particles which eventually create new signed particles in the process. Although this method has proved to be extremely advantageous in terms of computational resources - as a matter of fact it is able to simulate in a time-dependent fashion many-body systems on relatively small machines - the Wigner kernel can represent the bottleneck of simulations of certain systems. Moreover, storing the kernel can be another issue as the amount of memory needed is cursed by the dimensionality of the system. In this work, we introduce a new technique which drastically reduces the computation time and memory requirement to simulate time-dependent quantum systems which is based on the use of an appropriately tailored neural network combined with the signed particle formalism. In particular, the suggested neural network is able to compute efficiently and reliably the Wigner kernel without any training as its entire set of weights and biases is specified by analytical formulas. As a consequence, the amount of memory for quantum simulations radically drops since the kernel does not need to be stored anymore as it is now computed by the neural network itself, only on the cells of the (discretized) phase-space which are occupied by particles. As its is clearly shown in the final part of this paper, not only this novel approach drastically reduces the computational time, it also remains accurate. The author believes this work opens the way towards effective design of quantum devices, with incredible practical implications.
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

PubMed

Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

2018-06-01

Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.
An efficient pseudomedian filter for tiling microrrays.

PubMed

Royce, Thomas E; Carriero, Nicholas J; Gerstein, Mark B

2007-06-07

Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at http://tiling.gersteinlab.org/pseudomedian/.
An efficient pseudomedian filter for tiling microrrays

PubMed Central

Royce, Thomas E; Carriero, Nicholas J; Gerstein, Mark B

2007-01-01

Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. Results We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Conclusion Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at . PMID:17555595
Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

PubMed

Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

2016-04-01

Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

Benchmark Lisp And Ada Programs

NASA Technical Reports Server (NTRS)

Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.

1992-01-01

Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.
An enhanced mobile-healthcare emergency system based on extended chaotic maps.

PubMed

Lee, Cheng-Chi; Hsu, Che-Wei; Lai, Yan-Ming; Vasilakos, Athanasios

2013-10-01

Mobile Healthcare (m-Healthcare) systems, namely smartphone applications of pervasive computing that utilize wireless body sensor networks (BSNs), have recently been proposed to provide smartphone users with health monitoring services and received great attentions. An m-Healthcare system with flaws, however, may leak out the smartphone user's personal information and cause security, privacy preservation, or user anonymity problems. In 2012, Lu et al. proposed a secure and privacy-preserving opportunistic computing (SPOC) framework for mobile-Healthcare emergency. The brilliant SPOC framework can opportunistically gather resources on the smartphone such as computing power and energy to process the computing-intensive personal health information (PHI) in case of an m-Healthcare emergency with minimal privacy disclosure. To balance between the hazard of PHI privacy disclosure and the necessity of PHI processing and transmission in m-Healthcare emergency, in their SPOC framework, Lu et al. introduced an efficient user-centric privacy access control system which they built on the basis of an attribute-based access control mechanism and a new privacy-preserving scalar product computation (PPSPC) technique. However, we found out that Lu et al.'s protocol still has some secure flaws such as user anonymity and mutual authentication. To fix those problems and further enhance the computation efficiency of Lu et al.'s protocol, in this article, the authors will present an improved mobile-Healthcare emergency system based on extended chaotic maps. The new system is capable of not only providing flawless user anonymity and mutual authentication but also reducing the computation cost.
A Systems Approach to High Performance Buildings: A Computational Systems Engineering R&D Program to Increase DoD Energy Efficiency

DTIC Science & Technology

2012-02-01

for Low Energy Building Ventilation and Space Conditioning Systems...Building Energy Models ................... 162 APPENDIX D: Reduced-Order Modeling and Control Design for Low Energy Building Systems .... 172 D.1...Design for Low Energy Building Ventilation and Space Conditioning Systems This section focuses on the modeling and control of airflow in buildings
A Practical Strategy for Teaching a Child with Autism to Attend to and Imitate a Portable Video Model

ERIC Educational Resources Information Center

Plavnick, Joshua B.

2012-01-01

Video modeling is an effective and efficient methodology for teaching new skills to individuals with autism. New technology may enhance video modeling as smartphones or tablet computers allow for portable video displays. However, the reduced screen size may decrease the likelihood of attending to the video model for some children. The present…
Finite-frequency structural sensitivities of short-period compressional body waves

NASA Astrophysics Data System (ADS)

Fuji, Nobuaki; Chevrot, Sébastien; Zhao, Li; Geller, Robert J.; Kawai, Kenji

2012-07-01

We present an extension of the method recently introduced by Zhao & Chevrot for calculating Fréchet kernels from a precomputed database of strain Green's tensors by normal mode summation. The extension involves two aspects: (1) we compute the strain Green's tensors using the Direct Solution Method, which allows us to go up to frequencies as high as 1 Hz; and (2) we develop a spatial interpolation scheme so that the Green's tensors can be computed with a relatively coarse grid, thus improving the efficiency in the computation of the sensitivity kernels. The only requirement is that the Green's tensors be computed with a fine enough spatial sampling rate to avoid spatial aliasing. The Green's tensors can then be interpolated to any location inside the Earth, avoiding the need to store and retrieve strain Green's tensors for a fine sampling grid. The interpolation scheme not only significantly reduces the CPU time required to calculate the Green's tensor database and the disk space to store it, but also enhances the efficiency in computing the kernels by reducing the number of I/O operations needed to retrieve the Green's tensors. Our new implementation allows us to calculate sensitivity kernels for high-frequency teleseismic body waves with very modest computational resources such as a laptop. We illustrate the potential of our approach for seismic tomography by computing traveltime and amplitude sensitivity kernels for high frequency P, PKP and Pdiff phases. A comparison of our PKP kernels with those computed by asymptotic ray theory clearly shows the limits of the latter. With ray theory, it is not possible to model waves diffracted by internal discontinuities such as the core-mantle boundary, and it is also difficult to compute amplitudes for paths close to the B-caustic of the PKP phase. We also compute waveform partial derivatives for different parts of the seismic wavefield, a key ingredient for high resolution imaging by waveform inversion. Our computations of partial derivatives in the time window where PcP precursors are commonly observed show that the distribution of sensitivity is complex and counter-intuitive, with a large contribution from the mid-mantle region. This clearly emphasizes the need to use accurate and complete partial derivatives in waveform inversion.
Quantum rendering

NASA Astrophysics Data System (ADS)

Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

2003-08-01

In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
Control mechanism of double-rotator-structure ternary optical computer

NASA Astrophysics Data System (ADS)

Kai, SONG; Liping, YAN

2017-03-01

Double-rotator-structure ternary optical processor (DRSTOP) has two characteristics, namely, giant data-bits parallel computing and reconfigurable processor, which can handle thousands of data bits in parallel, and can run much faster than computers and other optical computer systems so far. In order to put DRSTOP into practical application, this paper established a series of methods, namely, task classification method, data-bits allocation method, control information generation method, control information formatting and sending method, and decoded results obtaining method and so on. These methods form the control mechanism of DRSTOP. This control mechanism makes DRSTOP become an automated computing platform. Compared with the traditional calculation tools, DRSTOP computing platform can ease the contradiction between high energy consumption and big data computing due to greatly reducing the cost of communications and I/O. Finally, the paper designed a set of experiments for DRSTOP control mechanism to verify its feasibility and correctness. Experimental results showed that the control mechanism is correct, feasible and efficient.
A heuristic for efficient data distribution management in distributed simulation

NASA Astrophysics Data System (ADS)

Gupta, Pankaj; Guha, Ratan K.

2005-05-01

In this paper, we propose an algorithm for reducing the complexity of region matching and efficient multicasting in data distribution management component of High Level Architecture (HLA) Run Time Infrastructure (RTI). The current data distribution management (DDM) techniques rely on computing the intersection between the subscription and update regions. When a subscription region and an update region of different federates overlap, RTI establishes communication between the publisher and the subscriber. It subsequently routes the updates from the publisher to the subscriber. The proposed algorithm computes the update/subscription regions matching for dynamic allocation of multicast group. It provides new multicast routines that exploit the connectivity of federation by communicating updates regarding interactions and routes information only to those federates that require them. The region-matching problem in DDM reduces to clique-covering problem using the connections graph abstraction where the federations represent the vertices and the update/subscribe relations represent the edges. We develop an abstract model based on connection graph for data distribution management. Using this abstract model, we propose a heuristic for solving the region-matching problem of DDM. We also provide complexity analysis of the proposed heuristics.
Adaptive vibrational configuration interaction (A-VCI): A posteriori error estimation to efficiently compute anharmonic IR spectra

NASA Astrophysics Data System (ADS)

Garnier, Romain; Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2016-05-01

A new variational algorithm called adaptive vibrational configuration interaction (A-VCI) intended for the resolution of the vibrational Schrödinger equation was developed. The main advantage of this approach is to efficiently reduce the dimension of the active space generated into the configuration interaction (CI) process. Here, we assume that the Hamiltonian writes as a sum of products of operators. This adaptive algorithm was developed with the use of three correlated conditions, i.e., a suitable starting space, a criterion for convergence, and a procedure to expand the approximate space. The velocity of the algorithm was increased with the use of a posteriori error estimator (residue) to select the most relevant direction to increase the space. Two examples have been selected for benchmark. In the case of H2CO, we mainly study the performance of A-VCI algorithm: comparison with the variation-perturbation method, choice of the initial space, and residual contributions. For CH3CN, we compare the A-VCI results with a computed reference spectrum using the same potential energy surface and for an active space reduced by about 90%.
Adaptive vibrational configuration interaction (A-VCI): A posteriori error estimation to efficiently compute anharmonic IR spectra.

PubMed

Garnier, Romain; Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2016-05-28

A new variational algorithm called adaptive vibrational configuration interaction (A-VCI) intended for the resolution of the vibrational Schrödinger equation was developed. The main advantage of this approach is to efficiently reduce the dimension of the active space generated into the configuration interaction (CI) process. Here, we assume that the Hamiltonian writes as a sum of products of operators. This adaptive algorithm was developed with the use of three correlated conditions, i.e., a suitable starting space, a criterion for convergence, and a procedure to expand the approximate space. The velocity of the algorithm was increased with the use of a posteriori error estimator (residue) to select the most relevant direction to increase the space. Two examples have been selected for benchmark. In the case of H2CO, we mainly study the performance of A-VCI algorithm: comparison with the variation-perturbation method, choice of the initial space, and residual contributions. For CH3CN, we compare the A-VCI results with a computed reference spectrum using the same potential energy surface and for an active space reduced by about 90%.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3

NASA Technical Reports Server (NTRS)

Lin, Shu

1998-01-01

Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
Seismic waveform tomography with shot-encoding using a restarted L-BFGS algorithm.

PubMed

Rao, Ying; Wang, Yanghua

2017-08-17

In seismic waveform tomography, or full-waveform inversion (FWI), one effective strategy used to reduce the computational cost is shot-encoding, which encodes all shots randomly and sums them into one super shot to significantly reduce the number of wavefield simulations in the inversion. However, this process will induce instability in the iterative inversion regardless of whether it uses a robust limited-memory BFGS (L-BFGS) algorithm. The restarted L-BFGS algorithm proposed here is both stable and efficient. This breakthrough ensures, for the first time, the applicability of advanced FWI methods to three-dimensional seismic field data. In a standard L-BFGS algorithm, if the shot-encoding remains unchanged, it will generate a crosstalk effect between different shots. This crosstalk effect can only be suppressed by employing sufficient randomness in the shot-encoding. Therefore, the implementation of the L-BFGS algorithm is restarted at every segment. Each segment consists of a number of iterations; the first few iterations use an invariant encoding, while the remainder use random re-coding. This restarted L-BFGS algorithm balances the computational efficiency of shot-encoding, the convergence stability of the L-BFGS algorithm, and the inversion quality characteristic of random encoding in FWI.
Improving Engine Efficiency Through Core Developments

NASA Technical Reports Server (NTRS)

Heidmann, James D.

2011-01-01

The NASA Environmentally Responsible Aviation (ERA) Project and Fundamental Aeronautics Projects are supporting compressor and turbine research with the goal of reducing aircraft engine fuel burn and greenhouse gas emissions. The primary goals of this work are to increase aircraft propulsion system fuel efficiency for a given mission by increasing the overall pressure ratio (OPR) of the engine while maintaining or improving aerodynamic efficiency of these components. An additional area of work involves reducing the amount of cooling air required to cool the turbine blades while increasing the turbine inlet temperature. This is complicated by the fact that the cooling air is becoming hotter due to the increases in OPR. Various methods are being investigated to achieve these goals, ranging from improved compressor three-dimensional blade designs to improved turbine cooling hole shapes and methods. Finally, a complementary effort in improving the accuracy, range, and speed of computational fluid mechanics (CFD) methods is proceeding to better capture the physical mechanisms underlying all these problems, for the purpose of improving understanding and future designs.
Coding efficiency of AVS 2.0 for CBAC and CABAC engines

NASA Astrophysics Data System (ADS)

Cui, Jing; Choi, Youngkyu; Chae, Soo-Ik

2015-12-01

In this paper we compare the coding efficiency of AVS 2.0[1] for engines of the Context-based Binary Arithmetic Coding (CBAC)[2] in the AVS 2.0 and the Context-Adaptive Binary Arithmetic Coder (CABAC)[3] in the HEVC[4]. For fair comparison, the CABAC is embedded in the reference code RD10.1 because the CBAC is in the HEVC in our previous work[5]. The rate estimation table is employed only for RDOQ in the RD code. To reduce the computation complexity of the video encoder, therefore we modified the RD code so that the rate estimation table is employed for all RDO decision. Furthermore, we also simplify the complexity of rate estimation table by reducing the bit depth of its fractional part to 2 from 8. The simulation result shows that the CABAC has the BD-rate loss of about 0.7% compared to the CBAC. It seems that the CBAC is a little more efficient than that the CABAC in the AVS 2.0.
Optimization of the working process of the axial compressor according to the criterion of efficiency

NASA Astrophysics Data System (ADS)

Baturin, O. V.; Popov, G. M.; Goryachkin, E. S.; Novikova, Yu D.

2017-01-01

The paper shows search results of the optimal shape of low pressure compressor blades of the industrial gas turbine plant using methods of computational fluid dynamics and multicriteria methods of mathematical optimization. The essence of the methods is that an increase in compressor efficiency should be achieved by increasing the degree of compression up to 2%, and reducing the air flow to 8% relative to basic engine parameters. However, the compressor design elements should be retained as maximally unchanged as possible. During the work, the calculation model of the workflow in the test compressor has been developed and verified in the NUMECA software package, the automated algorithm of the blades shape change has been also developed using a small number of variables, while maintaining its stress-strain state. It allows reducing the number of changeable variables more than twofold. As the result of this study, the option of compressor performance was found, which can increase its efficiency by 1.3% (abs.).
Analysis of multigrid methods on massively parallel computers: Architectural implications

NASA Technical Reports Server (NTRS)

Matheson, Lesley R.; Tarjan, Robert E.

1993-01-01

We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

DTIC Science & Technology

2017-10-04

Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Machine learning based Intelligent cognitive network using fog computing

NASA Astrophysics Data System (ADS)

Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

2017-05-01

In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.
Computing Finite-Time Lyapunov Exponents with Optimally Time Dependent Reduction

NASA Astrophysics Data System (ADS)

Babaee, Hessam; Farazmand, Mohammad; Sapsis, Themis; Haller, George

2016-11-01

We present a method to compute Finite-Time Lyapunov Exponents (FTLE) of a dynamical system using Optimally Time-Dependent (OTD) reduction recently introduced by H. Babaee and T. P. Sapsis. The OTD modes are a set of finite-dimensional, time-dependent, orthonormal basis {ui (x , t) } |i=1N that capture the directions associated with transient instabilities. The evolution equation of the OTD modes is derived from a minimization principle that optimally approximates the most unstable directions over finite times. To compute the FTLE, we evolve a single OTD mode along with the nonlinear dynamics. We approximate the FTLE from the reduced system obtained from projecting the instantaneous linearized dynamics onto the OTD mode. This results in a significant reduction in the computational cost compared to conventional methods for computing FTLE. We demonstrate the efficiency of our method for double Gyre and ABC flows. ARO project 66710-EG-YIP.
Continuous-variable quantum Gaussian process regression and quantum singular value decomposition of nonsparse low-rank matrices

NASA Astrophysics Data System (ADS)

Das, Siddhartha; Siopsis, George; Weedbrook, Christian

2018-02-01

With the significant advancement in quantum computation during the past couple of decades, the exploration of machine-learning subroutines using quantum strategies has become increasingly popular. Gaussian process regression is a widely used technique in supervised classical machine learning. Here we introduce an algorithm for Gaussian process regression using continuous-variable quantum systems that can be realized with technology based on photonic quantum computers under certain assumptions regarding distribution of data and availability of efficient quantum access. Our algorithm shows that by using a continuous-variable quantum computer a dramatic speedup in computing Gaussian process regression can be achieved, i.e., the possibility of exponentially reducing the time to compute. Furthermore, our results also include a continuous-variable quantum-assisted singular value decomposition method of nonsparse low rank matrices and forms an important subroutine in our Gaussian process regression algorithm.

An Efficient Local Correlation Matrix Decomposition Approach for the Localization Implementation of Ensemble-Based Assimilation Methods

NASA Astrophysics Data System (ADS)

Zhang, Hongqin; Tian, Xiangjun

2018-04-01

Ensemble-based data assimilation methods often use the so-called localization scheme to improve the representation of the ensemble background error covariance (Be). Extensive research has been undertaken to reduce the computational cost of these methods by using the localized ensemble samples to localize Be by means of a direct decomposition of the local correlation matrix C. However, the computational costs of the direct decomposition of the local correlation matrix C are still extremely high due to its high dimension. In this paper, we propose an efficient local correlation matrix decomposition approach based on the concept of alternating directions. This approach is intended to avoid direct decomposition of the correlation matrix. Instead, we first decompose the correlation matrix into 1-D correlation matrices in the three coordinate directions, then construct their empirical orthogonal function decomposition at low resolution. This procedure is followed by the 1-D spline interpolation process to transform the above decompositions to the high-resolution grid. Finally, an efficient correlation matrix decomposition is achieved by computing the very similar Kronecker product. We conducted a series of comparison experiments to illustrate the validity and accuracy of the proposed local correlation matrix decomposition approach. The effectiveness of the proposed correlation matrix decomposition approach and its efficient localization implementation of the nonlinear least-squares four-dimensional variational assimilation are further demonstrated by several groups of numerical experiments based on the Advanced Research Weather Research and Forecasting model.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schaerer, Roman Pascal, E-mail: schaerer@mathcces.rwth-aachen.de; Bansal, Pratyuksh; Torrilhon, Manuel

We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropymore » distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.« less
Developing an eLearning tool formalizing in YAWL the guidelines used in a transfusion medicine service.

PubMed

Russo, Paola; Piazza, Miriam; Leonardi, Giorgio; Roncoroni, Layla; Russo, Carlo; Spadaro, Salvatore; Quaglini, Silvana

2012-01-01

The blood transfusion is a complex activity subject to a high risk of eventually fatal errors. The development and application of computer-based systems could help reducing the error rate, playing a fundamental role in the improvement of the quality of care. This poster presents an under development eLearning tool formalizing the guidelines of the transfusion process. This system, implemented in YAWL (Yet Another Workflow Language), will be used to train the personnel in order to improve the efficiency of care and to reduce errors.
Ion flux through membrane channels--an enhanced algorithm for the Poisson-Nernst-Planck model.

PubMed

Dyrka, Witold; Augousti, Andy T; Kotulska, Malgorzata

2008-09-01

A novel algorithmic scheme for numerical solution of the 3D Poisson-Nernst-Planck model is proposed. The algorithmic improvements are universal and independent of the detailed physical model. They include three major steps: an adjustable gradient-based step value, an adjustable relaxation coefficient, and an optimized segmentation of the modeled space. The enhanced algorithm significantly accelerates the speed of computation and reduces the computational demands. The theoretical model was tested on a regular artificial channel and validated on a real protein channel-alpha-hemolysin, proving its efficiency. (c) 2008 Wiley Periodicals, Inc.
Clinical applications of cone beam computed tomography in endodontics: A comprehensive review.

PubMed

Cohenca, Nestor; Shemesh, Hagay

2015-06-01

Cone beam computed tomography (CBCT) is a new technology that produces three-dimensional (3D) digital imaging at reduced cost and less radiation for the patient than traditional CT scans. It also delivers faster and easier image acquisition. By providing a 3D representation of the maxillofacial tissues in a cost- and dose-efficient manner, a better preoperative assessment can be obtained for diagnosis and treatment. This comprehensive review presents current applications of CBCT in endodontics. Specific case examples illustrate the difference in treatment planning with traditional periapical radiography versus CBCT technology.
Gradient Learning Algorithms for Ontology Computing

PubMed Central

Gao, Wei; Zhu, Linli

2014-01-01

The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting. PMID:25530752
Forward Period Analysis Method of the Periodic Hamiltonian System.

PubMed

Wang, Pengfei

2016-01-01

Using the forward period analysis (FPA), we obtain the period of a Morse oscillator and mathematical pendulum system, with the accuracy of 100 significant digits. From these results, the long-term [0, 1060] (time unit) solutions, ranging from the Planck time to the age of the universe, are computed reliably and quickly with a parallel multiple-precision Taylor series (PMT) scheme. The application of FPA to periodic systems can greatly reduce the computation time of long-term reliable simulations. This scheme provides an efficient way to generate reference solutions, against which long-term simulations using other schemes can be tested.
A method for brain 3D surface reconstruction from MR images

NASA Astrophysics Data System (ADS)

Zhao, De-xin

2014-09-01

Due to the encephalic tissues are highly irregular, three-dimensional (3D) modeling of brain always leads to complicated computing. In this paper, we explore an efficient method for brain surface reconstruction from magnetic resonance (MR) images of head, which is helpful to surgery planning and tumor localization. A heuristic algorithm is proposed for surface triangle mesh generation with preserved features, and the diagonal length is regarded as the heuristic information to optimize the shape of triangle. The experimental results show that our approach not only reduces the computational complexity, but also completes 3D visualization with good quality.
A Proposal of a Fast Computation Method for Thermal Capacity and Voltage ATC by Means of Homotopy Functions

NASA Astrophysics Data System (ADS)

Zoka, Yoshifumi; Yorino, Naoto; Kawano, Koki; Suenari, Hiroyasu

This paper proposes a fast computation method for Available Transfer Capability (ATC) with respect to thermal and voltage magnitude limits. In the paper, ATC is formulated as an optimization problem. In order to obtain the efficiency for the N-1 outage contingency calculations, linear sensitivity methods are applied for screening and ranking all contingency selections with respect to the thermal and voltage magnitude limits margin to identify the severest case. In addition, homotopy functions are used for the generator QV constrains to reduce the maximum error of the linear estimation. Then, the Primal-Dual Interior Point Method (PDIPM) is used to solve the optimization problem for the severest case only, in which the solutions of ATC can be obtained efficiently. The effectiveness of the proposed method is demonstrated through IEEE 30, 57, 118-bus systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Carpentier, J.L.; Di Bono, P.J.; Tournebise, P.J.

The efficient bounding method for DC contingency analysis is improved using reciprocity properties. Knowing the consequences of the outage of a branch, these properties provide the consequences on that branch of various kinds of outages. This is used in order to reduce computation times and to get rid of some difficulties, such as those occurring when a branch flow is close to its limit before outage. Compensation, sparse vector, sparse inverse and bounding techniques are also used. A program has been implemented for single branch outages and tested on actual French EHV 650 bus network. Computation times are 60% ofmore » the Efficient Bounding method. The relevant algorithm is described in detail in the first part of this paper. In the second part, reciprocity properties and bounding formulas are extended for multiple branch outages and for multiple generator or load outages. An algorithm is proposed in order to handle all these cases simultaneously.« less
Inner Space Perturbation Theory in Matrix Product States: Replacing Expensive Iterative Diagonalization.

PubMed

Ren, Jiajun; Yi, Yuanping; Shuai, Zhigang

2016-10-11

We propose an inner space perturbation theory (isPT) to replace the expensive iterative diagonalization in the standard density matrix renormalization group theory (DMRG). The retained reduced density matrix eigenstates are partitioned into the active and secondary space. The first-order wave function and the second- and third-order energies are easily computed by using one step Davidson iteration. Our formulation has several advantages including (i) keeping a balance between the efficiency and accuracy, (ii) capturing more entanglement with the same amount of computational time, (iii) recovery of the standard DMRG when all the basis states belong to the active space. Numerical examples for the polyacenes and periacene show that the efficiency gain is considerable and the accuracy loss due to the perturbation treatment is very small, when half of the total basis states belong to the active space. Moreover, the perturbation calculations converge in all our numerical examples.
Fitting neuron models to spike trains.

PubMed

Rossant, Cyrille; Goodman, Dan F M; Fontaine, Bertrand; Platkiewicz, Jonathan; Magnusson, Anna K; Brette, Romain

2011-01-01

Computational modeling is increasingly used to understand the function of neural circuits in systems neuroscience. These studies require models of individual neurons with realistic input-output properties. Recently, it was found that spiking models can accurately predict the precisely timed spike trains produced by cortical neurons in response to somatically injected currents, if properly fitted. This requires fitting techniques that are efficient and flexible enough to easily test different candidate models. We present a generic solution, based on the Brian simulator (a neural network simulator in Python), which allows the user to define and fit arbitrary neuron models to electrophysiological recordings. It relies on vectorization and parallel computing techniques to achieve efficiency. We demonstrate its use on neural recordings in the barrel cortex and in the auditory brainstem, and confirm that simple adaptive spiking models can accurately predict the response of cortical neurons. Finally, we show how a complex multicompartmental model can be reduced to a simple effective spiking model.
Speed-up of the volumetric method of moments for the approximate RCS of large arbitrary-shaped dielectric targets

NASA Astrophysics Data System (ADS)

Moreno, Javier; Somolinos, Álvaro; Romero, Gustavo; González, Iván; Cátedra, Felipe

2017-08-01

A method for the rigorous computation of the electromagnetic scattering of large dielectric volumes is presented. One goal is to simplify the analysis of large dielectric targets with translational symmetries taken advantage of their Toeplitz symmetry. Then, the matrix-fill stage of the Method of Moments is efficiently obtained because the number of coupling terms to compute is reduced. The Multilevel Fast Multipole Method is applied to solve the problem. Structured meshes are obtained efficiently to approximate the dielectric volumes. The regular mesh grid is achieved by using parallelepipeds whose centres have been identified as internal to the target. The ray casting algorithm is used to classify the parallelepiped centres. It may become a bottleneck when too many points are evaluated in volumes defined by parametric surfaces, so a hierarchical algorithm is proposed to minimize the number of evaluations. Measurements and analytical results are included for validation purposes.
ElGamal cryptosystem with embedded compression-crypto technique

NASA Astrophysics Data System (ADS)

Mandangan, Arif; Yin, Lee Souk; Hung, Chang Ee; Hussin, Che Haziqah Che

2014-12-01

Key distribution problem in symmetric cryptography has been solved by the emergence of asymmetric cryptosystem. Due to its mathematical complexity, computation efficiency becomes a major problem in the real life application of asymmetric cryptosystem. This scenario encourage various researches regarding the enhancement of computation efficiency of asymmetric cryptosystems. ElGamal cryptosystem is one of the most established asymmetric cryptosystem. By using proper parameters, ElGamal cryptosystem is able to provide a good level of information security. On the other hand, Compression-Crypto technique is a technique used to reduce the number of plaintext to be encrypted from k∈ Z+, k > 2 plaintext become only 2 plaintext. Instead of encrypting k plaintext, we only need to encrypt these 2 plaintext. In this paper, we embed the Compression-Crypto technique into the ElGamal cryptosystem. To show that the embedded ElGamal cryptosystem works, we provide proofs on the decryption processes to recover the encrypted plaintext.
Algorithms and software for nonlinear structural dynamics

NASA Technical Reports Server (NTRS)

Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.

1989-01-01

The objective of this research is to develop efficient methods for explicit time integration in nonlinear structural dynamics for computers which utilize both concurrency and vectorization. As a framework for these studies, the program WHAMS, which is described in Explicit Algorithms for the Nonlinear Dynamics of Shells (T. Belytschko, J. I. Lin, and C.-S. Tsay, Computer Methods in Applied Mechanics and Engineering, Vol. 42, 1984, pp 225 to 251), is used. There are two factors which make the development of efficient concurrent explicit time integration programs a challenge in a structural dynamics program: (1) the need for a variety of element types, which complicates the scheduling-allocation problem; and (2) the need for different time steps in different parts of the mesh, which is here called mixed delta t integration, so that a few stiff elements do not reduce the time steps throughout the mesh.
Multilayer shallow water models with locally variable number of layers and semi-implicit time discretization

NASA Astrophysics Data System (ADS)

Bonaventura, Luca; Fernández-Nieto, Enrique D.; Garres-Díaz, José; Narbona-Reina, Gladys

2018-07-01

We propose an extension of the discretization approaches for multilayer shallow water models, aimed at making them more flexible and efficient for realistic applications to coastal flows. A novel discretization approach is proposed, in which the number of vertical layers and their distribution are allowed to change in different regions of the computational domain. Furthermore, semi-implicit schemes are employed for the time discretization, leading to a significant efficiency improvement for subcritical regimes. We show that, in the typical regimes in which the application of multilayer shallow water models is justified, the resulting discretization does not introduce any major spurious feature and allows again to reduce substantially the computational cost in areas with complex bathymetry. As an example of the potential of the proposed technique, an application to a sediment transport problem is presented, showing a remarkable improvement with respect to standard discretization approaches.
Contour integral method for obtaining the self-energy matrices of electrodes in electron transport calculations

NASA Astrophysics Data System (ADS)

Iwase, Shigeru; Futamura, Yasunori; Imakura, Akira; Sakurai, Tetsuya; Tsukamoto, Shigeru; Ono, Tomoya

2018-05-01

We propose an efficient computational method for evaluating the self-energy matrices of electrodes to study ballistic electron transport properties in nanoscale systems. To reduce the high computational cost incurred in large systems, a contour integral eigensolver based on the Sakurai-Sugiura method combined with the shifted biconjugate gradient method is developed to solve an exponential-type eigenvalue problem for complex wave vectors. A remarkable feature of the proposed algorithm is that the numerical procedure is very similar to that of conventional band structure calculations. We implement the developed method in the framework of the real-space higher-order finite-difference scheme with nonlocal pseudopotentials. Numerical tests for a wide variety of materials validate the robustness, accuracy, and efficiency of the proposed method. As an illustration of the method, we present the electron transport property of the freestanding silicene with the line defect originating from the reversed buckled phases.
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Comprehensive Anti-error Study on Power Grid Dispatching Based on Regional Regulation and Integration

NASA Astrophysics Data System (ADS)

Zhang, Yunju; Chen, Zhongyi; Guo, Ming; Lin, Shunsheng; Yan, Yinyang

2018-01-01

With the large capacity of the power system, the development trend of the large unit and the high voltage, the scheduling operation is becoming more frequent and complicated, and the probability of operation error increases. This paper aims at the problem of the lack of anti-error function, single scheduling function and low working efficiency for technical support system in regional regulation and integration, the integrated construction of the error prevention of the integrated architecture of the system of dispatching anti - error of dispatching anti - error of power network based on cloud computing has been proposed. Integrated system of error prevention of Energy Management System, EMS, and Operation Management System, OMS have been constructed either. The system architecture has good scalability and adaptability, which can improve the computational efficiency, reduce the cost of system operation and maintenance, enhance the ability of regional regulation and anti-error checking with broad development prospects.
The Stochastic Parcel Model: A deterministic parameterization of stochastically entraining convection

DOE PAGES

Romps, David M.

2016-03-01

Convective entrainment is a process that is poorly represented in existing convective parameterizations. By many estimates, convective entrainment is the leading source of error in global climate models. As a potential remedy, an Eulerian implementation of the Stochastic Parcel Model (SPM) is presented here as a convective parameterization that treats entrainment in a physically realistic and computationally efficient way. Drawing on evidence that convecting clouds comprise air parcels subject to Poisson-process entrainment events, the SPM calculates the deterministic limit of an infinite number of such parcels. For computational efficiency, the SPM groups parcels at each height by their purity, whichmore » is a measure of their total entrainment up to that height. This reduces the calculation of convective fluxes to a sequence of matrix multiplications. The SPM is implemented in a single-column model and compared with a large-eddy simulation of deep convection.« less

One-dimensional nonlinear theory for rectangular helix traveling-wave tube

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Chengfang, E-mail: fchffchf@126.com; Zhao, Bo; Yang, Yudong

A 1-D nonlinear theory of a rectangular helix traveling-wave tube (TWT) interacting with a ribbon beam is presented in this paper. The RF field is modeled by a transmission line equivalent circuit, the ribbon beam is divided into a sequence of thin rectangular electron discs with the same cross section as the beam, and the charges are assumed to be uniformly distributed over these discs. Then a method of computing the space-charge field by solving Green's Function in the Cartesian Coordinate-system is fully described. Nonlinear partial differential equations for field amplitudes and Lorentz force equations for particles are solved numericallymore » using the fourth-order Runge-Kutta technique. The tube's gain, output power, and efficiency of the above TWT are computed. The results show that increasing the cross section of the ribbon beam will improve a rectangular helix TWT's efficiency and reduce the saturated length.« less
Development of an Efficient Binaural Simulation for the Analysis of Structural Acoustic Data

NASA Technical Reports Server (NTRS)

Johnson, Marty E.; Lalime, Aimee L.; Grosveld, Ferdinand W.; Rizzi, Stephen A.; Sullivan, Brenda M.

2003-01-01

Applying binaural simulation techniques to structural acoustic data can be very computationally intensive as the number of discrete noise sources can be very large. Typically, Head Related Transfer Functions (HRTFs) are used to individually filter the signals from each of the sources in the acoustic field. Therefore, creating a binaural simulation implies the use of potentially hundreds of real time filters. This paper details two methods of reducing the number of real-time computations required by: (i) using the singular value decomposition (SVD) to reduce the complexity of the HRTFs by breaking them into dominant singular values and vectors and (ii) by using equivalent source reduction (ESR) to reduce the number of sources to be analyzed in real-time by replacing sources on the scale of a structural wavelength with sources on the scale of an acoustic wavelength. The ESR and SVD reduction methods can be combined to provide an estimated computation time reduction of 99.4% for the structural acoustic data tested. In addition, preliminary tests have shown that there is a 97% correlation between the results of the combined reduction methods and the results found with the current binaural simulation techniques
Efficient provisioning for multi-core applications with LSF

NASA Astrophysics Data System (ADS)

Dal Pra, Stefano

2015-12-01

Tier-1 sites providing computing power for HEP experiments are usually tightly designed for high throughput performances. This is pursued by reducing the variety of supported use cases and tuning for performances those ones, the most important of which have been that of singlecore jobs. Moreover, the usual workload is saturation: each available core in the farm is in use and there are queued jobs waiting for their turn to run. Enabling multi-core jobs thus requires dedicating a number of hosts where to run, and waiting for them to free the needed number of cores. This drain-time introduces a loss of computing power driven by the number of unusable empty cores. As an increasing demand for multi-core capable resources have emerged, a Task Force have been constituted in WLCG, with the goal to define a simple and efficient multi-core resource provisioning model. This paper details the work done at the INFN Tier-1 to enable multi-core support for the LSF batch system, with the intent of reducing to the minimum the average number of unused cores. The adopted strategy has been that of dedicating to multi-core a dynamic set of nodes, whose dimension is mainly driven by the number of pending multi-core requests and fair-share priority of the submitting user. The node status transition, from single to multi core et vice versa, is driven by a finite state machine which is implemented in a custom multi-core director script, running in the cluster. After describing and motivating both the implementation and the details specific to the LSF batch system, results about performance are reported. Factors having positive and negative impact on the overall efficiency are discussed and solutions to reduce at most the negative ones are proposed.
Efficient fuzzy C-means architecture for image segmentation.

PubMed

Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen

2011-01-01

This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate.
Efficient 3D movement-based kernel density estimator and application to wildlife ecology

USGS Publications Warehouse

Tracey-PR, Jeff; Sheppard, James K.; Lockwood, Glenn K.; Chourasia, Amit; Tatineni, Mahidhar; Fisher, Robert N.; Sinkovits, Robert S.

2014-01-01

We describe an efficient implementation of a 3D movement-based kernel density estimator for determining animal space use from discrete GPS measurements. This new method provides more accurate results, particularly for species that make large excursions in the vertical dimension. The downside of this approach is that it is much more computationally expensive than simpler, lower-dimensional models. Through a combination of code restructuring, parallelization and performance optimization, we were able to reduce the time to solution by up to a factor of 1000x, thereby greatly improving the applicability of the method.
Hierarchical Poly Tree Configurations for the Solution of Dynamically Refined Finte Element Models

NASA Technical Reports Server (NTRS)

Gute, G. D.; Padovan, J.

1993-01-01

This paper demonstrates how a multilevel substructuring technique, called the Hierarchical Poly Tree (HPT), can be used to integrate a localized mesh refinement into the original finite element model more efficiently. The optimal HPT configurations for solving isoparametrically square h-, p-, and hp-extensions on single and multiprocessor computers is derived. In addition, the reduced number of stiffness matrix elements that must be stored when employing this type of solution strategy is quantified. Moreover, the HPT inherently provides localize 'error-trapping' and a logical, efficient means with which to isolate physically anomalous and analytically singular behavior.
Holograms for laser diode: Single mode optical fiber coupling

NASA Technical Reports Server (NTRS)

Fuhr, P. L.

1982-01-01

The low coupling efficiency of semiconductor laser emissions into a single mode optical fibers place a severe restriction on their use. Associated with these conventional optical coupling techniques are stringent alignment sensitivities. Using holographic elements, the coupling efficiency may be increased and the alignment sensitivity greatly reduced. Both conventional and computer methods used in the generation of the holographic couplers are described and diagrammed. The reconstruction geometries used are shown to be somewhat restrictive but substantially less rigid than their conventional optical counterparts. Single and double hologram techniques are examined concerning their respective ease of fabrication and relative merits.
The Energy Efficiency Potential of Cloud-Based Software: A U.S. Case Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Masanet, Eric; Shehabi, Arman; Liang, Jiaqi

The energy use of data centers is a topic that has received much attention, given that data centers currently account for 1-2% of global electricity use. However, cloud computing holds great potential to reduce data center energy demand moving forward, due to both large reductions in total servers through consolidation and large increases in facility efficiencies compared to traditional local data centers. However, analyzing the net energy implications of shifts to the cloud can be very difficult, because data center services can affect many different components of society’s economic and energy systems.
Reliability and energy efficiency of zero energy homes (Conference Presentation)

NASA Astrophysics Data System (ADS)

Dhere, Neelkanth G.

2016-09-01

Photovoltaic (PV) modules and systems are being installed increasingly on residential homes to increase the proportion of renewable energy in the energy mix. The ultimate goal is to attain sustainability without subsidy. The prices of PV modules and systems have declined substantially during the recent years. They will be reduced further to reach grid parity. Additionally the total consumed energy must be reduced by making the homes more energy efficient. FSEC/UCF Researchers have carried out research on development of PV cells and systems and on reducing the energy consumption in homes and by small businesses. Additionally, they have provided guidance on PV module and system installation and to make the homes energy efficient. The produced energy is fed into the utility grid and the consumed energy is obtained from the utility grid, thus the grid is assisting in the storage. Currently the State of Florida permits net metering leading to equal charge for the produced and consumed electricity. This paper describes the installation of 5.29 KW crystalline silicon PV system on a south-facing tilt at approximately latitude tilt on a single-story, three-bedroom house. It also describes the computer program on Building Energy Efficiency and the processes that were employed for reducing the energy consumption of the house by improving the insulation, air circulation and windows, etc. Finally it describes actual consumption and production of electricity and the installation of additional crystalline silicon PV modules and balance of system to make it a zero energy home.
Data-driven Applications for the Sun-Earth System

NASA Astrophysics Data System (ADS)

Kondrashov, D. A.

2016-12-01

Advances in observational and data mining techniques allow extracting information from the large volume of Sun-Earth observational data that can be assimilated into first principles physical models. However, equations governing Sun-Earth phenomena are typically nonlinear, complex, and high-dimensional. The high computational demand of solving the full governing equations over a large range of scales precludes the use of a variety of useful assimilative tools that rely on applied mathematical and statistical techniques for quantifying uncertainty and predictability. Effective use of such tools requires the development of computationally efficient methods to facilitate fusion of data with models. This presentation will provide an overview of various existing as well as newly developed data-driven techniques adopted from atmospheric and oceanic sciences that proved to be useful for space physics applications, such as computationally efficient implementation of Kalman Filter in radiation belts modeling, solar wind gap-filling by Singular Spectrum Analysis, and low-rank procedure for assimilation of low-altitude ionospheric magnetic perturbations into the Lyon-Fedder-Mobarry (LFM) global magnetospheric model. Reduced-order non-Markovian inverse modeling and novel data-adaptive decompositions of Sun-Earth datasets will be also demonstrated.
An installed nacelle design code using a multiblock Euler solver. Volume 1: Theory document

NASA Technical Reports Server (NTRS)

Chen, H. C.

1992-01-01

An efficient multiblock Euler design code was developed for designing a nacelle installed on geometrically complex airplane configurations. This approach employed a design driver based on a direct iterative surface curvature method developed at LaRC. A general multiblock Euler flow solver was used for computing flow around complex geometries. The flow solver used a finite-volume formulation with explicit time-stepping to solve the Euler Equations. It used a multiblock version of the multigrid method to accelerate the convergence of the calculations. The design driver successively updated the surface geometry to reduce the difference between the computed and target pressure distributions. In the flow solver, the change in surface geometry was simulated by applying surface transpiration boundary conditions to avoid repeated grid generation during design iterations. Smoothness of the designed surface was ensured by alternate application of streamwise and circumferential smoothings. The capability and efficiency of the code was demonstrated through the design of both an isolated nacelle and an installed nacelle at various flow conditions. Information on the execution of the computer program is provided in volume 2.
Modeling of the WSTF frictional heating apparatus in high pressure systems

NASA Technical Reports Server (NTRS)

Skowlund, Christopher T.

1992-01-01

In order to develop a computer program able to model the frictional heating of metals in high pressure oxygen or nitrogen a number of additions have been made to the frictional heating model originally developed for tests in low pressure helium. These additions include: (1) a physical property package for the gases to account for departures from the ideal gas state; (2) two methods for spatial discretization (finite differences with quadratic interpolation or orthogonal collocation on finite elements) which substantially reduce the computer time required to solve the transient heat balance; (3) more efficient programs for the integration of the ordinary differential equations resulting from the discretization of the partial differential equations; and (4) two methods for determining the best-fit parameters via minimization of the mean square error (either a direct search multivariable simplex method or a modified Levenburg-Marquardt algorithm). The resulting computer program has been shown to be accurate, efficient and robust for determining the heat flux or friction coefficient vs. time at the interface of the stationary and rotating samples.
Spectral-based propagation schemes for time-dependent quantum systems with application to carbon nanotubes

NASA Astrophysics Data System (ADS)

Chen, Zuojing; Polizzi, Eric

2010-11-01

Effective modeling and numerical spectral-based propagation schemes are proposed for addressing the challenges in time-dependent quantum simulations of systems ranging from atoms, molecules, and nanostructures to emerging nanoelectronic devices. While time-dependent Hamiltonian problems can be formally solved by propagating the solutions along tiny simulation time steps, a direct numerical treatment is often considered too computationally demanding. In this paper, however, we propose to go beyond these limitations by introducing high-performance numerical propagation schemes to compute the solution of the time-ordered evolution operator. In addition to the direct Hamiltonian diagonalizations that can be efficiently performed using the new eigenvalue solver FEAST, we have designed a Gaussian propagation scheme and a basis-transformed propagation scheme (BTPS) which allow to reduce considerably the simulation times needed by time intervals. It is outlined that BTPS offers the best computational efficiency allowing new perspectives in time-dependent simulations. Finally, these numerical schemes are applied to study the ac response of a (5,5) carbon nanotube within a three-dimensional real-space mesh framework.
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing

PubMed Central

Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant

2016-01-01

Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
Electrochemical Ionization and Analyte Charging in the Array of Micromachined UltraSonic Electrospray (AMUSE) Ion Source

PubMed Central

Forbes, Thomas P.; Degertekin, F. Levent; Fedorov, Andrei G.

2010-01-01

Electrochemistry and ion transport in a planar array of mechanically-driven, droplet-based ion sources are investigated using an approximate time scale analysis and in-depth computational simulations. The ion source is modeled as a controlled-current electrolytic cell, in which the piezoelectric transducer electrode, which mechanically drives the charged droplet generation using ultrasonic atomization, also acts as the oxidizing/corroding anode (positive mode). The interplay between advective and diffusive ion transport of electrochemically generated ions is analyzed as a function of the transducer duty cycle and electrode location. A time scale analysis of the relative importance of advective vs. diffusive ion transport provides valuable insight into optimality, from the ionization prospective, of alternative design and operation modes of the ion source operation. A computational model based on the solution of time-averaged, quasi-steady advection-diffusion equations for electroactive species transport is used to substantiate the conclusions of the time scale analysis. The results show that electrochemical ion generation at the piezoelectric transducer electrodes located at the back-side of the ion source reservoir results in poor ionization efficiency due to insufficient time for the charged analyte to diffuse away from the electrode surface to the ejection location, especially at near 100% duty cycle operation. Reducing the duty cycle of droplet/analyte ejection increases the analyte residence time and, in turn, improves ionization efficiency, but at an expense of the reduced device throughput. For applications where this is undesirable, i.e., multiplexed and disposable device configurations, an alternative electrode location is incorporated. By moving the charging electrode to the nozzle surface, the diffusion length scale is greatly reduced, drastically improving ionization efficiency. The ionization efficiency of all operating conditions considered is expressed as a function of the dimensionless Peclet number, which defines the relative effect of advection as compared to diffusion. This analysis is general enough to elucidate an important role of electrochemistry in ionization efficiency of any arrayed ion sources, be they mechanically-driven or electrosprays, and is vital for determining optimal design and operation conditions. PMID:20607111
Efficient implementation of one- and two-component analytical energy gradients in exact two-component theory

NASA Astrophysics Data System (ADS)

Franzke, Yannick J.; Middendorf, Nils; Weigend, Florian

2018-03-01

We present an efficient algorithm for one- and two-component analytical energy gradients with respect to nuclear displacements in the exact two-component decoupling approach to the one-electron Dirac equation (X2C). Our approach is a generalization of the spin-free ansatz by Cheng and Gauss [J. Chem. Phys. 135, 084114 (2011)], where the perturbed one-electron Hamiltonian is calculated by solving a first-order response equation. Computational costs are drastically reduced by applying the diagonal local approximation to the unitary decoupling transformation (DLU) [D. Peng and M. Reiher, J. Chem. Phys. 136, 244108 (2012)] to the X2C Hamiltonian. The introduced error is found to be almost negligible as the mean absolute error of the optimized structures amounts to only 0.01 pm. Our implementation in TURBOMOLE is also available within the finite nucleus model based on a Gaussian charge distribution. For a X2C/DLU gradient calculation, computational effort scales cubically with the molecular size, while storage increases quadratically. The efficiency is demonstrated in calculations of large silver clusters and organometallic iridium complexes.
Improved Discrete Ordinate Solutions in the Presence of an Anisotropically Reflecting Lower Boundary: Upgrades of the DISORT Computational Tool

NASA Technical Reports Server (NTRS)

Lin, Z.; Stamnes, S.; Jin, Z.; Laszlo, I.; Tsay, S. C.; Wiscombe, W. J.; Stamnes, K.

2015-01-01

A successor version 3 of DISORT (DISORT3) is presented with important upgrades that improve the accuracy, efficiency, and stability of the algorithm. Compared with version 2 (DISORT2 released in 2000) these upgrades include (a) a redesigned BRDF computation that improves both speed and accuracy, (b) a revised treatment of the single scattering correction, and (c) additional efficiency and stability upgrades for beam sources. In DISORT3 the BRDF computation is improved in the following three ways: (i) the Fourier decomposition is prepared "off-line", thus avoiding the repeated internal computations done in DISORT2; (ii) a large enough number of terms in the Fourier expansion of the BRDF is employed to guarantee accurate values of the expansion coefficients (default is 200 instead of 50 in DISORT2); (iii) in the post processing step the reflection of the direct attenuated beam from the lower boundary is included resulting in a more accurate single scattering correction. These improvements in the treatment of the BRDF have led to improved accuracy and a several-fold increase in speed. In addition, the stability of beam sources has been improved by removing a singularity occurring when the cosine of the incident beam angle is too close to the reciprocal of any of the eigenvalues. The efficiency for beam sources has been further improved from reducing by a factor of 2 (compared to DISORT2) the dimension of the linear system of equations that must be solved to obtain the particular solutions, and by replacing the LINPAK routines used in DISORT2 by LAPACK 3.5 in DISORT3. These beam source stability and efficiency upgrades bring enhanced stability and an additional 5-7% improvement in speed. Numerical results are provided to demonstrate and quantify the improvements in accuracy and efficiency of DISORT3 compared to DISORT2.
Using Nested Contractions and a Hierarchical Tensor Format To Compute Vibrational Spectra of Molecules with Seven Atoms.

PubMed

Thomas, Phillip S; Carrington, Tucker

2015-12-31

We propose a method for solving the vibrational Schrödinger equation with which one can compute hundreds of energy levels of seven-atom molecules using at most a few gigabytes of memory. It uses nested contractions in conjunction with the reduced-rank block power method (RRBPM) described in J. Chem. Phys. 2014, 140, 174111. Successive basis contractions are organized into a tree, the nodes of which are associated with eigenfunctions of reduced-dimension Hamiltonians. The RRBPM is used recursively to compute eigenfunctions of nodes in bases of products of reduced-dimension eigenfunctions of nodes with fewer coordinates. The corresponding vectors are tensors in what is called CP-format. The final wave functions are therefore represented in a hierarchical CP-format. Computational efficiency and accuracy are significantly improved by representing the Hamiltonian in the same hierarchical format as the wave function. We demonstrate that with this hierarchical RRBPM it is possible to compute energy levels of a 64-D coupled-oscillator model Hamiltonian and also of acetonitrile (CH3CN) and ethylene oxide (C2H4O), for which we use quartic potentials. The most accurate acetonitrile calculation uses 139 MB of memory and takes 3.2 h on a single processor. The most accurate ethylene oxide calculation uses 6.1 GB of memory and takes 14 d on 63 processors. The hierarchical RRBPM shatters the memory barrier that impedes the calculation of vibrational spectra.
PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)

NASA Astrophysics Data System (ADS)

Vincenti, Henri

2016-03-01

The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.
Sparse-grid, reduced-basis Bayesian inversion: Nonaffine-parametric nonlinear equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Peng, E-mail: peng@ices.utexas.edu; Schwab, Christoph, E-mail: christoph.schwab@sam.math.ethz.ch

2016-07-01

We extend the reduced basis (RB) accelerated Bayesian inversion methods for affine-parametric, linear operator equations which are considered in [16,17] to non-affine, nonlinear parametric operator equations. We generalize the analysis of sparsity of parametric forward solution maps in [20] and of Bayesian inversion in [48,49] to the fully discrete setting, including Petrov–Galerkin high-fidelity (“HiFi”) discretization of the forward maps. We develop adaptive, stochastic collocation based reduction methods for the efficient computation of reduced bases on the parametric solution manifold. The nonaffinity and nonlinearity with respect to (w.r.t.) the distributed, uncertain parameters and the unknown solution is collocated; specifically, by themore » so-called Empirical Interpolation Method (EIM). For the corresponding Bayesian inversion problems, computational efficiency is enhanced in two ways: first, expectations w.r.t. the posterior are computed by adaptive quadratures with dimension-independent convergence rates proposed in [49]; the present work generalizes [49] to account for the impact of the PG discretization in the forward maps on the convergence rates of the Quantities of Interest (QoI for short). Second, we propose to perform the Bayesian estimation only w.r.t. a parsimonious, RB approximation of the posterior density. Based on the approximation results in [49], the infinite-dimensional parametric, deterministic forward map and operator admit N-term RB and EIM approximations which converge at rates which depend only on the sparsity of the parametric forward map. In several numerical experiments, the proposed algorithms exhibit dimension-independent convergence rates which equal, at least, the currently known rate estimates for N-term approximation. We propose to accelerate Bayesian estimation by first offline construction of reduced basis surrogates of the Bayesian posterior density. The parsimonious surrogates can then be employed for online data assimilation and for Bayesian estimation. They also open a perspective for optimal experimental design.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.