Sample records for highly computationally efficient

  1. High Performance Computing Meets Energy Efficiency - Continuum Magazine |

    Science.gov Websites

    NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data

  2. Experimental Realization of High-Efficiency Counterfactual Computation.

    PubMed

    Kong, Fei; Ju, Chenyong; Huang, Pu; Wang, Pengfei; Kong, Xi; Shi, Fazhan; Jiang, Liang; Du, Jiangfeng

    2015-08-21

    Counterfactual computation (CFC) exemplifies the fascinating quantum process by which the result of a computation may be learned without actually running the computer. In previous experimental studies, the counterfactual efficiency is limited to below 50%. Here we report an experimental realization of the generalized CFC protocol, in which the counterfactual efficiency can break the 50% limit and even approach unity in principle. The experiment is performed with the spins of a negatively charged nitrogen-vacancy color center in diamond. Taking advantage of the quantum Zeno effect, the computer can remain in the not-running subspace due to the frequent projection by the environment, while the computation result can be revealed by final detection. The counterfactual efficiency up to 85% has been demonstrated in our experiment, which opens the possibility of many exciting applications of CFC, such as high-efficiency quantum integration and imaging.

  3. Experimental Realization of High-Efficiency Counterfactual Computation

    NASA Astrophysics Data System (ADS)

    Kong, Fei; Ju, Chenyong; Huang, Pu; Wang, Pengfei; Kong, Xi; Shi, Fazhan; Jiang, Liang; Du, Jiangfeng

    2015-08-01

    Counterfactual computation (CFC) exemplifies the fascinating quantum process by which the result of a computation may be learned without actually running the computer. In previous experimental studies, the counterfactual efficiency is limited to below 50%. Here we report an experimental realization of the generalized CFC protocol, in which the counterfactual efficiency can break the 50% limit and even approach unity in principle. The experiment is performed with the spins of a negatively charged nitrogen-vacancy color center in diamond. Taking advantage of the quantum Zeno effect, the computer can remain in the not-running subspace due to the frequent projection by the environment, while the computation result can be revealed by final detection. The counterfactual efficiency up to 85% has been demonstrated in our experiment, which opens the possibility of many exciting applications of CFC, such as high-efficiency quantum integration and imaging.

  4. High-Performance Computing Systems and Operations | Computational Science |

    Science.gov Websites

    NREL Systems and Operations High-Performance Computing Systems and Operations NREL operates high-performance computing (HPC) systems dedicated to advancing energy efficiency and renewable energy technologies. Capabilities NREL's HPC capabilities include: High-Performance Computing Systems We operate

  5. HPC on Competitive Cloud Resources

    NASA Astrophysics Data System (ADS)

    Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff

    Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.

  6. On efficiency of fire simulation realization: parallelization with greater number of computational meshes

    NASA Astrophysics Data System (ADS)

    Valasek, Lukas; Glasa, Jan

    2017-12-01

    Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.

  7. Marc Henry de Frahan | NREL

    Science.gov Websites

    Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid

  8. Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

    NASA Astrophysics Data System (ADS)

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

    2015-05-01

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).

  9. Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi

    DOE PAGES

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...

    2015-05-22

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less

  10. Computer Facilitated Mathematical Methods in Chemical Engineering--Similarity Solution

    ERIC Educational Resources Information Center

    Subramanian, Venkat R.

    2006-01-01

    High-performance computers coupled with highly efficient numerical schemes and user-friendly software packages have helped instructors to teach numerical solutions and analysis of various nonlinear models more efficiently in the classroom. One of the main objectives of a model is to provide insight about the system of interest. Analytical…

  11. High-Performance Computing for the Electromagnetic Modeling and Simulation of Interconnects

    NASA Technical Reports Server (NTRS)

    Schutt-Aine, Jose E.

    1996-01-01

    The electromagnetic modeling of packages and interconnects plays a very important role in the design of high-speed digital circuits, and is most efficiently performed by using computer-aided design algorithms. In recent years, packaging has become a critical area in the design of high-speed communication systems and fast computers, and the importance of the software support for their development has increased accordingly. Throughout this project, our efforts have focused on the development of modeling and simulation techniques and algorithms that permit the fast computation of the electrical parameters of interconnects and the efficient simulation of their electrical performance.

  12. High-efficiency AlGaAs-GaAs Cassegrainian concentrator cells

    NASA Technical Reports Server (NTRS)

    Werthen, J. G.; Hamaker, H. C.; Virshup, G. F.; Lewis, C. R.; Ford, C. W.

    1985-01-01

    AlGaAs-GaAs heteroface space concentrator solar cells have been fabricated by metalorganic chemical vapor deposition. AMO efficiencies as high as 21.1% have been observed both for p-n and np structures under concentration (90 to 100X) at 25 C. Both cell structures are characterized by high quantum efficiencies and their performances are close to those predicted by a realistic computer model. In agreement with the computer model, the n-p cell exhibits a higher short-circuit current density.

  13. High-Performance Computing Data Center Warm-Water Liquid Cooling |

    Science.gov Websites

    Computational Science | NREL Warm-Water Liquid Cooling High-Performance Computing Data Center Warm-Water Liquid Cooling NREL's High-Performance Computing Data Center (HPC Data Center) is liquid water Liquid cooling technologies offer a more energy-efficient solution that also allows for effective

  14. Evaluation of the discrete vortex wake cross flow model using vector computers. Part 1: Theory and application

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The current program had the objective to modify a discrete vortex wake method to efficiently compute the aerodynamic forces and moments on high fineness ratio bodies (f approximately 10.0). The approach is to increase computational efficiency by structuring the program to take advantage of new computer vector software and by developing new algorithms when vector software can not efficiently be used. An efficient program was written and substantial savings achieved. Several test cases were run for fineness ratios up to f = 16.0 and angles of attack up to 50 degrees.

  15. High-efficiency photorealistic computer-generated holograms based on the backward ray-tracing technique

    NASA Astrophysics Data System (ADS)

    Wang, Yuan; Chen, Zhidong; Sang, Xinzhu; Li, Hui; Zhao, Linmin

    2018-03-01

    Holographic displays can provide the complete optical wave field of a three-dimensional (3D) scene, including the depth perception. However, it often takes a long computation time to produce traditional computer-generated holograms (CGHs) without more complex and photorealistic rendering. The backward ray-tracing technique is able to render photorealistic high-quality images, which noticeably reduce the computation time achieved from the high-degree parallelism. Here, a high-efficiency photorealistic computer-generated hologram method is presented based on the ray-tracing technique. Rays are parallelly launched and traced under different illuminations and circumstances. Experimental results demonstrate the effectiveness of the proposed method. Compared with the traditional point cloud CGH, the computation time is decreased to 24 s to reconstruct a 3D object of 100 ×100 rays with continuous depth change.

  16. Energy efficiency indicators for high electric-load buildings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aebischer, Bernard; Balmer, Markus A.; Kinney, Satkartar

    2003-06-01

    Energy per unit of floor area is not an adequate indicator for energy efficiency in high electric-load buildings. For two activities, restaurants and computer centres, alternative indicators for energy efficiency are discussed.

  17. Recovery Act - CAREER: Sustainable Silicon -- Energy-Efficient VLSI Interconnect for Extreme-Scale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiang, Patrick

    2014-01-31

    The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou­ sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on­ chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.

  18. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  19. High-Performance Computing Data Center Efficiency Dashboard | Computational

    Science.gov Websites

    recovery water (ERW) loop Heat exchanger for energy recovery Thermosyphon Heat exchanger between ERW loop and cooling tower loop Evaporative cooling towers Learn more about our energy-efficient facility

  20. A flexible and accurate digital volume correlation method applicable to high-resolution volumetric images

    NASA Astrophysics Data System (ADS)

    Pan, Bing; Wang, Bo

    2017-10-01

    Digital volume correlation (DVC) is a powerful technique for quantifying interior deformation within solid opaque materials and biological tissues. In the last two decades, great efforts have been made to improve the accuracy and efficiency of the DVC algorithm. However, there is still a lack of a flexible, robust and accurate version that can be efficiently implemented in personal computers with limited RAM. This paper proposes an advanced DVC method that can realize accurate full-field internal deformation measurement applicable to high-resolution volume images with up to billions of voxels. Specifically, a novel layer-wise reliability-guided displacement tracking strategy combined with dynamic data management is presented to guide the DVC computation from slice to slice. The displacements at specified calculation points in each layer are computed using the advanced 3D inverse-compositional Gauss-Newton algorithm with the complete initial guess of the deformation vector accurately predicted from the computed calculation points. Since only limited slices of interest in the reference and deformed volume images rather than the whole volume images are required, the DVC calculation can thus be efficiently implemented on personal computers. The flexibility, accuracy and efficiency of the presented DVC approach are demonstrated by analyzing computer-simulated and experimentally obtained high-resolution volume images.

  1. Staff | Computational Science | NREL

    Science.gov Websites

    develops and leads laboratory-wide efforts in high-performance computing and energy-efficient data centers Professional IV-High Perf Computing Jim.Albin@nrel.gov 303-275-4069 Ananthan, Shreyas Senior Scientist - High -Performance Algorithms and Modeling Shreyas.Ananthan@nrel.gov 303-275-4807 Bendl, Kurt IT Professional IV-High

  2. CFD Analysis and Design Optimization Using Parallel Computers

    NASA Technical Reports Server (NTRS)

    Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James

    1997-01-01

    A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.

  3. Application of the MacCormack scheme to overland flow routing for high-spatial resolution distributed hydrological model

    NASA Astrophysics Data System (ADS)

    Zhang, Ling; Nan, Zhuotong; Liang, Xu; Xu, Yi; Hernández, Felipe; Li, Lianxia

    2018-03-01

    Although process-based distributed hydrological models (PDHMs) are evolving rapidly over the last few decades, their extensive applications are still challenged by the computational expenses. This study attempted, for the first time, to apply the numerically efficient MacCormack algorithm to overland flow routing in a representative high-spatial resolution PDHM, i.e., the distributed hydrology-soil-vegetation model (DHSVM), in order to improve its computational efficiency. The analytical verification indicates that both the semi and full versions of the MacCormack schemes exhibit robust numerical stability and are more computationally efficient than the conventional explicit linear scheme. The full-version outperforms the semi-version in terms of simulation accuracy when a same time step is adopted. The semi-MacCormack scheme was implemented into DHSVM (version 3.1.2) to solve the kinematic wave equations for overland flow routing. The performance and practicality of the enhanced DHSVM-MacCormack model was assessed by performing two groups of modeling experiments in the Mercer Creek watershed, a small urban catchment near Bellevue, Washington. The experiments show that DHSVM-MacCormack can considerably improve the computational efficiency without compromising the simulation accuracy of the original DHSVM model. More specifically, with the same computational environment and model settings, the computational time required by DHSVM-MacCormack can be reduced to several dozen minutes for a simulation period of three months (in contrast with one day and a half by the original DHSVM model) without noticeable sacrifice of the accuracy. The MacCormack scheme proves to be applicable to overland flow routing in DHSVM, which implies that it can be coupled into other PHDMs for watershed routing to either significantly improve their computational efficiency or to make the kinematic wave routing for high resolution modeling computational feasible.

  4. Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

    PubMed

    Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

    2010-01-01

    Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.

  5. Scientific Discovery through Advanced Computing (SciDAC-3) Partnership Project Annual Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoffman, Forest M.; Bochev, Pavel B.; Cameron-Smith, Philip J..

    The Applying Computationally Efficient Schemes for BioGeochemical Cycles ACES4BGC Project is advancing the predictive capabilities of Earth System Models (ESMs) by reducing two of the largest sources of uncertainty, aerosols and biospheric feedbacks, with a highly efficient computational approach. In particular, this project is implementing and optimizing new computationally efficient tracer advection algorithms for large numbers of tracer species; adding important biogeochemical interactions between the atmosphere, land, and ocean models; and applying uncertainty quanti cation (UQ) techniques to constrain process parameters and evaluate uncertainties in feedbacks between biogeochemical cycles and the climate system.

  6. Green's function methods in heavy ion shielding

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Costen, Robert C.; Shinn, Judy L.; Badavi, Francis F.

    1993-01-01

    An analytic solution to the heavy ion transport in terms of Green's function is used to generate a highly efficient computer code for space applications. The efficiency of the computer code is accomplished by a nonperturbative technique extending Green's function over the solution domain. The computer code can also be applied to accelerator boundary conditions to allow code validation in laboratory experiments.

  7. Efficient computation of the Grünwald-Letnikov fractional diffusion derivative using adaptive time step memory

    NASA Astrophysics Data System (ADS)

    MacDonald, Christopher L.; Bhattacharya, Nirupama; Sprouse, Brian P.; Silva, Gabriel A.

    2015-09-01

    Computing numerical solutions to fractional differential equations can be computationally intensive due to the effect of non-local derivatives in which all previous time points contribute to the current iteration. In general, numerical approaches that depend on truncating part of the system history while efficient, can suffer from high degrees of error and inaccuracy. Here we present an adaptive time step memory method for smooth functions applied to the Grünwald-Letnikov fractional diffusion derivative. This method is computationally efficient and results in smaller errors during numerical simulations. Sampled points along the system's history at progressively longer intervals are assumed to reflect the values of neighboring time points. By including progressively fewer points backward in time, a temporally 'weighted' history is computed that includes contributions from the entire past of the system, maintaining accuracy, but with fewer points actually calculated, greatly improving computational efficiency.

  8. Efficient computation of photonic crystal waveguide modes with dispersive material.

    PubMed

    Schmidt, Kersten; Kappeler, Roman

    2010-03-29

    The optimization of PhC waveguides is a key issue for successfully designing PhC devices. Since this design task is computationally expensive, efficient methods are demanded. The available codes for computing photonic bands are also applied to PhC waveguides. They are reliable but not very efficient, which is even more pronounced for dispersive material. We present a method based on higher order finite elements with curved cells, which allows to solve for the band structure taking directly into account the dispersiveness of the materials. This is accomplished by reformulating the wave equations as a linear eigenproblem in the complex wave-vectors k. For this method, we demonstrate the high efficiency for the computation of guided PhC waveguide modes by a convergence analysis.

  9. [Economic efficiency of computer monitoring of health].

    PubMed

    Il'icheva, N P; Stazhadze, L L

    2001-01-01

    Presents the method of computer monitoring of health, based on utilization of modern information technologies in public health. The method helps organize preventive activities of an outpatient clinic at a high level and essentially decrease the time and money loss. Efficiency of such preventive measures, increased number of computer and Internet users suggests that such methods are promising and further studies in this field are needed.

  10. High-Performance Computing Unlocks Innovation at NREL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    Need to fly around a wind farm? Or step inside a molecule? NREL scientists use a super powerful (and highly energy-efficient) computer to visualize and solve big problems in renewable energy research.

  11. High-Level Data-Abstraction System

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1986-01-01

    Communication with data-base processor flexible and efficient. High Level Data Abstraction (HILDA) system is three-layer system supporting data-abstraction features of Intel data-base processor (DBP). Purpose of HILDA establishment of flexible method of efficiently communicating with DBP. Power of HILDA lies in its extensibility with regard to syntax and semantic changes. HILDA's high-level query language readily modified. Offers powerful potential to computer sites where DBP attached to DEC VAX-series computer. HILDA system written in Pascal and FORTRAN 77 for interactive execution.

  12. Interaction Entropy: A New Paradigm for Highly Efficient and Reliable Computation of Protein-Ligand Binding Free Energy.

    PubMed

    Duan, Lili; Liu, Xiao; Zhang, John Z H

    2016-05-04

    Efficient and reliable calculation of protein-ligand binding free energy is a grand challenge in computational biology and is of critical importance in drug design and many other molecular recognition problems. The main challenge lies in the calculation of entropic contribution to protein-ligand binding or interaction systems. In this report, we present a new interaction entropy method which is theoretically rigorous, computationally efficient, and numerically reliable for calculating entropic contribution to free energy in protein-ligand binding and other interaction processes. Drastically different from the widely employed but extremely expensive normal mode method for calculating entropy change in protein-ligand binding, the new method calculates the entropic component (interaction entropy or -TΔS) of the binding free energy directly from molecular dynamics simulation without any extra computational cost. Extensive study of over a dozen randomly selected protein-ligand binding systems demonstrated that this interaction entropy method is both computationally efficient and numerically reliable and is vastly superior to the standard normal mode approach. This interaction entropy paradigm introduces a novel and intuitive conceptual understanding of the entropic effect in protein-ligand binding and other general interaction systems as well as a practical method for highly efficient calculation of this effect.

  13. Spin-neurons: A possible path to energy-efficient neuromorphic computers

    NASA Astrophysics Data System (ADS)

    Sharad, Mrigank; Fan, Deliang; Roy, Kaushik

    2013-12-01

    Recent years have witnessed growing interest in the field of brain-inspired computing based on neural-network architectures. In order to translate the related algorithmic models into powerful, yet energy-efficient cognitive-computing hardware, computing-devices beyond CMOS may need to be explored. The suitability of such devices to this field of computing would strongly depend upon how closely their physical characteristics match with the essential computing primitives employed in such models. In this work, we discuss the rationale of applying emerging spin-torque devices for bio-inspired computing. Recent spin-torque experiments have shown the path to low-current, low-voltage, and high-speed magnetization switching in nano-scale magnetic devices. Such magneto-metallic, current-mode spin-torque switches can mimic the analog summing and "thresholding" operation of an artificial neuron with high energy-efficiency. Comparison with CMOS-based analog circuit-model of a neuron shows that "spin-neurons" (spin based circuit model of neurons) can achieve more than two orders of magnitude lower energy and beyond three orders of magnitude reduction in energy-delay product. The application of spin-neurons can therefore be an attractive option for neuromorphic computers of future.

  14. Spin-neurons: A possible path to energy-efficient neuromorphic computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharad, Mrigank; Fan, Deliang; Roy, Kaushik

    Recent years have witnessed growing interest in the field of brain-inspired computing based on neural-network architectures. In order to translate the related algorithmic models into powerful, yet energy-efficient cognitive-computing hardware, computing-devices beyond CMOS may need to be explored. The suitability of such devices to this field of computing would strongly depend upon how closely their physical characteristics match with the essential computing primitives employed in such models. In this work, we discuss the rationale of applying emerging spin-torque devices for bio-inspired computing. Recent spin-torque experiments have shown the path to low-current, low-voltage, and high-speed magnetization switching in nano-scale magnetic devices.more » Such magneto-metallic, current-mode spin-torque switches can mimic the analog summing and “thresholding” operation of an artificial neuron with high energy-efficiency. Comparison with CMOS-based analog circuit-model of a neuron shows that “spin-neurons” (spin based circuit model of neurons) can achieve more than two orders of magnitude lower energy and beyond three orders of magnitude reduction in energy-delay product. The application of spin-neurons can therefore be an attractive option for neuromorphic computers of future.« less

  15. An Initial Multi-Domain Modeling of an Actively Cooled Structure

    NASA Technical Reports Server (NTRS)

    Steinthorsson, Erlendur

    1997-01-01

    A methodology for the simulation of turbine cooling flows is being developed. The methodology seeks to combine numerical techniques that optimize both accuracy and computational efficiency. Key components of the methodology include the use of multiblock grid systems for modeling complex geometries, and multigrid convergence acceleration for enhancing computational efficiency in highly resolved fluid flow simulations. The use of the methodology has been demonstrated in several turbo machinery flow and heat transfer studies. Ongoing and future work involves implementing additional turbulence models, improving computational efficiency, adding AMR.

  16. Performance of computer-designed small-size multistage depressed collectors for a high-perveance traveling wave tube

    NASA Technical Reports Server (NTRS)

    Ramins, P.

    1984-01-01

    Computer designed axisymmetric 2.4-cm-diameter three-, four-, and five-stage depressed collectors were evaluated in conjunction with an octave bandwidth, high-perveance, and high-electronic-efficiency, griddled-gun traveling wave tube (TWT). Spent-beam refocusing was used to condition the beam for optimum entry into the depressed collectors. Both the TWT and multistage depressed collector (MDC) efficiencies were measured, as well as the MDC current, dissipated thermal power, and DC input power distributions, for the TWT operating both at saturation over its bandwidth and over its full dynamic range. Relatively high collector efficiencies were obtained, leading to a very substantial improvement in the overall TWT efficiency. In spite of large fixed TWT body losses (due largely to the 6 to 8 percent beam interception), average overall efficiencies of 45 to 47 percent (for three to five collector stages) were obtained at saturation across the 2.5-, to 5.5-GHz operating band. For operation below saturation the collector efficiencies improved steadily, leading to reasonable ( 20 percent) overall efficiencies as far as 6 dB below saturation.

  17. An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

    NASA Astrophysics Data System (ADS)

    Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

    2018-02-01

    De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

  18. Cognitive Load Theory vs. Constructivist Approaches: Which Best Leads to Efficient, Deep Learning?

    ERIC Educational Resources Information Center

    Vogel-Walcutt, J. J.; Gebrim, J. B.; Bowers, C.; Carper, T. M.; Nicholson, D.

    2011-01-01

    Computer-assisted learning, in the form of simulation-based training, is heavily focused upon by the military. Because computer-based learning offers highly portable, reusable, and cost-efficient training options, the military has dedicated significant resources to the investigation of instructional strategies that improve learning efficiency…

  19. A Simple and Resource-efficient Setup for the Computer-aided Drug Design Laboratory.

    PubMed

    Moretti, Loris; Sartori, Luca

    2016-10-01

    Undertaking modelling investigations for Computer-Aided Drug Design (CADD) requires a proper environment. In principle, this could be done on a single computer, but the reality of a drug discovery program requires robustness and high-throughput computing (HTC) to efficiently support the research. Therefore, a more capable alternative is needed but its implementation has no widespread solution. Here, the realization of such a computing facility is discussed, from general layout to technical details all aspects are covered. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Numerical algorithm comparison for the accurate and efficient computation of high-incidence vortical flow

    NASA Technical Reports Server (NTRS)

    Chaderjian, Neal M.

    1991-01-01

    Computations from two Navier-Stokes codes, NSS and F3D, are presented for a tangent-ogive-cylinder body at high angle of attack. Features of this steady flow include a pair of primary vortices on the leeward side of the body as well as secondary vortices. The topological and physical plausibility of this vortical structure is discussed. The accuracy of these codes are assessed by comparison of the numerical solutions with experimental data. The effects of turbulence model, numerical dissipation, and grid refinement are presented. The overall efficiency of these codes are also assessed by examining their convergence rates, computational time per time step, and maximum allowable time step for time-accurate computations. Overall, the numerical results from both codes compared equally well with experimental data, however, the NSS code was found to be significantly more efficient than the F3D code.

  1. A self-synchronized high speed computational ghost imaging system: A leap towards dynamic capturing

    NASA Astrophysics Data System (ADS)

    Suo, Jinli; Bian, Liheng; Xiao, Yudong; Wang, Yongjin; Zhang, Lei; Dai, Qionghai

    2015-11-01

    High quality computational ghost imaging needs to acquire a large number of correlated measurements between the to-be-imaged scene and different reference patterns, thus ultra-high speed data acquisition is of crucial importance in real applications. To raise the acquisition efficiency, this paper reports a high speed computational ghost imaging system using a 20 kHz spatial light modulator together with a 2 MHz photodiode. Technically, the synchronization between such high frequency illumination and bucket detector needs nanosecond trigger precision, so the development of synchronization module is quite challenging. To handle this problem, we propose a simple and effective computational self-synchronization scheme by building a general mathematical model and introducing a high precision synchronization technique. The resulted efficiency is around 14 times faster than state-of-the-arts, and takes an important step towards ghost imaging of dynamic scenes. Besides, the proposed scheme is a general approach with high flexibility for readily incorporating other illuminators and detectors.

  2. Free-electron laser simulations on the MPP

    NASA Technical Reports Server (NTRS)

    Vonlaven, Scott A.; Liebrock, Lorie M.

    1987-01-01

    Free electron lasers (FELs) are of interest because they provide high power, high efficiency, and broad tunability. FEL simulations can make efficient use of computers of the Massively Parallel Processor (MPP) class because most of the processing consists of applying a simple equation to a set of identical particles. A test version of the KMS Fusion FEL simulation, which resides mainly in the MPPs host computer and only partially in the MPP, has run successfully.

  3. Progress in a novel architecture for high performance processing

    NASA Astrophysics Data System (ADS)

    Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

    2018-04-01

    The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).

  4. Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage

    PubMed Central

    Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

    2018-01-01

    Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810

  5. Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.

    PubMed

    Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

    2018-04-13

    Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.

  6. FastMag: Fast micromagnetic simulator for complex magnetic structures (invited)

    NASA Astrophysics Data System (ADS)

    Chang, R.; Li, S.; Lubarda, M. V.; Livshitz, B.; Lomakin, V.

    2011-04-01

    A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus, discretized superposition integrals are computed using a nonuniform grid interpolation method, which evaluates the field from N sources at N collocated observers in O(N) operations. This approach allows handling objects of arbitrary shape, allows easily calculating of the field outside the magnetized domains, does not require solving a linear system of equations, and requires little memory. FastMag is implemented on GPUs with ?> GPU-central processing unit speed-ups of 2 orders of magnitude. Simulations are shown of a large array of magnetic dots and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.

  7. Vibration extraction based on fast NCC algorithm and high-speed camera.

    PubMed

    Lei, Xiujun; Jin, Yi; Guo, Jie; Zhu, Chang'an

    2015-09-20

    In this study, a high-speed camera system is developed to complete the vibration measurement in real time and to overcome the mass introduced by conventional contact measurements. The proposed system consists of a notebook computer and a high-speed camera which can capture the images as many as 1000 frames per second. In order to process the captured images in the computer, the normalized cross-correlation (NCC) template tracking algorithm with subpixel accuracy is introduced. Additionally, a modified local search algorithm based on the NCC is proposed to reduce the computation time and to increase efficiency significantly. The modified algorithm can rapidly accomplish one displacement extraction 10 times faster than the traditional template matching without installing any target panel onto the structures. Two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the system performance in practice. The results demonstrated the high accuracy and efficiency of the camera system in extracting vibrating signals.

  8. Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.

    PubMed

    Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi

    2018-01-01

    In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. Computational efficiency for the surface renewal method

    NASA Astrophysics Data System (ADS)

    Kelley, Jason; Higgins, Chad

    2018-04-01

    Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.

  10. Energy Efficiency Challenges of 5G Small Cell Networks.

    PubMed

    Ge, Xiaohu; Yang, Jing; Gharavi, Hamid; Sun, Yang

    2017-05-01

    The deployment of a large number of small cells poses new challenges to energy efficiency, which has often been ignored in fifth generation (5G) cellular networks. While massive multiple-input multiple outputs (MIMO) will reduce the transmission power at the expense of higher computational cost, the question remains as to which computation or transmission power is more important in the energy efficiency of 5G small cell networks. Thus, the main objective in this paper is to investigate the computation power based on the Landauer principle. Simulation results reveal that more than 50% of the energy is consumed by the computation power at 5G small cell base stations (BSs). Moreover, the computation power of 5G small cell BS can approach 800 watt when the massive MIMO (e.g., 128 antennas) is deployed to transmit high volume traffic. This clearly indicates that computation power optimization can play a major role in the energy efficiency of small cell networks.

  11. Energy Efficiency Challenges of 5G Small Cell Networks

    PubMed Central

    Ge, Xiaohu; Yang, Jing; Gharavi, Hamid; Sun, Yang

    2017-01-01

    The deployment of a large number of small cells poses new challenges to energy efficiency, which has often been ignored in fifth generation (5G) cellular networks. While massive multiple-input multiple outputs (MIMO) will reduce the transmission power at the expense of higher computational cost, the question remains as to which computation or transmission power is more important in the energy efficiency of 5G small cell networks. Thus, the main objective in this paper is to investigate the computation power based on the Landauer principle. Simulation results reveal that more than 50% of the energy is consumed by the computation power at 5G small cell base stations (BSs). Moreover, the computation power of 5G small cell BS can approach 800 watt when the massive MIMO (e.g., 128 antennas) is deployed to transmit high volume traffic. This clearly indicates that computation power optimization can play a major role in the energy efficiency of small cell networks. PMID:28757670

  12. Efficient Stochastic Inversion Using Adjoint Models and Kernel-PCA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thimmisetty, Charanraj A.; Zhao, Wenju; Chen, Xiao

    2017-10-18

    Performing stochastic inversion on a computationally expensive forward simulation model with a high-dimensional uncertain parameter space (e.g. a spatial random field) is computationally prohibitive even when gradient information can be computed efficiently. Moreover, the ‘nonlinear’ mapping from parameters to observables generally gives rise to non-Gaussian posteriors even with Gaussian priors, thus hampering the use of efficient inversion algorithms designed for models with Gaussian assumptions. In this paper, we propose a novel Bayesian stochastic inversion methodology, which is characterized by a tight coupling between the gradient-based Langevin Markov Chain Monte Carlo (LMCMC) method and a kernel principal component analysis (KPCA). Thismore » approach addresses the ‘curse-of-dimensionality’ via KPCA to identify a low-dimensional feature space within the high-dimensional and nonlinearly correlated parameter space. In addition, non-Gaussian posterior distributions are estimated via an efficient LMCMC method on the projected low-dimensional feature space. We will demonstrate this computational framework by integrating and adapting our recent data-driven statistics-on-manifolds constructions and reduction-through-projection techniques to a linear elasticity model.« less

  13. Multiscale Space-Time Computational Methods for Fluid-Structure Interactions

    DTIC Science & Technology

    2015-09-13

    prescribed fully or partially, is from an actual locust, extracted from high-speed, multi-camera video recordings of the locust in a wind tunnel . We use...With creative methods for coupling the fluid and structure, we can increase the scope and efficiency of the FSI modeling . Multiscale methods, which now...play an important role in computational mathematics, can also increase the accuracy and efficiency of the computer modeling techniques. The main

  14. Computing Interactions Of Free-Space Radiation With Matter

    NASA Technical Reports Server (NTRS)

    Wilson, J. W.; Cucinotta, F. A.; Shinn, J. L.; Townsend, L. W.; Badavi, F. F.; Tripathi, R. K.; Silberberg, R.; Tsao, C. H.; Badwar, G. D.

    1995-01-01

    High Charge and Energy Transport (HZETRN) computer program computationally efficient, user-friendly package of software adressing problem of transport of, and shielding against, radiation in free space. Designed as "black box" for design engineers not concerned with physics of underlying atomic and nuclear radiation processes in free-space environment, but rather primarily interested in obtaining fast and accurate dosimetric information for design and construction of modules and devices for use in free space. Computational efficiency achieved by unique algorithm based on deterministic approach to solution of Boltzmann equation rather than computationally intensive statistical Monte Carlo method. Written in FORTRAN.

  15. A novel composite adaptive flap controller design by a high-efficient modified differential evolution identification approach.

    PubMed

    Li, Nailu; Mu, Anle; Yang, Xiyun; Magar, Kaman T; Liu, Chao

    2018-05-01

    The optimal tuning of adaptive flap controller can improve adaptive flap control performance on uncertain operating environments, but the optimization process is usually time-consuming and it is difficult to design proper optimal tuning strategy for the flap control system (FCS). To solve this problem, a novel adaptive flap controller is designed based on a high-efficient differential evolution (DE) identification technique and composite adaptive internal model control (CAIMC) strategy. The optimal tuning can be easily obtained by DE identified inverse of the FCS via CAIMC structure. To achieve fast tuning, a high-efficient modified adaptive DE algorithm is proposed with new mutant operator and varying range adaptive mechanism for the FCS identification. A tradeoff between optimized adaptive flap control and low computation cost is successfully achieved by proposed controller. Simulation results show the robustness of proposed method and its superiority to conventional adaptive IMC (AIMC) flap controller and the CAIMC flap controllers using other DE algorithms on various uncertain operating conditions. The high computation efficiency of proposed controller is also verified based on the computation time on those operating cases. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  16. Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.

    PubMed

    Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia

    2016-03-08

    A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.

  17. Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy

    NASA Astrophysics Data System (ADS)

    Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli

    2014-03-01

    One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.

  18. PRESAGE: Protecting Structured Address Generation against Soft Errors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharma, Vishal C.; Gopalakrishnan, Ganesh; Krishnamoorthy, Sriram

    Modern computer scaling trends in pursuit of larger component counts and power efficiency have, unfortunately, lead to less reliable hardware and consequently soft errors escaping into application data ("silent data corruptions"). Techniques to enhance system resilience hinge on the availability of efficient error detectors that have high detection rates, low false positive rates, and lower computational overhead. Unfortunately, efficient detectors to detect faults during address generation (to index large arrays) have not been widely researched. We present a novel lightweight compiler-driven technique called PRESAGE for detecting bit-flips affecting structured address computations. A key insight underlying PRESAGE is that any addressmore » computation scheme that flows an already incurred error is better than a scheme that corrupts one particular array access but otherwise (falsely) appears to compute perfectly. Enabling the flow of errors allows one to situate detectors at loop exit points, and helps turn silent corruptions into easily detectable error situations. Our experiments using PolyBench benchmark suite indicate that PRESAGE-based error detectors have a high error-detection rate while incurring low overheads.« less

  19. PRESAGE: Protecting Structured Address Generation against Soft Errors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharma, Vishal C.; Gopalakrishnan, Ganesh; Krishnamoorthy, Sriram

    Modern computer scaling trends in pursuit of larger component counts and power efficiency have, unfortunately, lead to less reliable hardware and consequently soft errors escaping into application data ("silent data corruptions"). Techniques to enhance system resilience hinge on the availability of efficient error detectors that have high detection rates, low false positive rates, and lower computational overhead. Unfortunately, efficient detectors to detect faults during address generation have not been widely researched (especially in the context of indexing large arrays). We present a novel lightweight compiler-driven technique called PRESAGE for detecting bit-flips affecting structured address computations. A key insight underlying PRESAGEmore » is that any address computation scheme that propagates an already incurred error is better than a scheme that corrupts one particular array access but otherwise (falsely) appears to compute perfectly. Ensuring the propagation of errors allows one to place detectors at loop exit points and helps turn silent corruptions into easily detectable error situations. Our experiments using the PolyBench benchmark suite indicate that PRESAGE-based error detectors have a high error-detection rate while incurring low overheads.« less

  20. Toward high-efficiency and detailed Monte Carlo simulation study of the granular flow spallation target

    NASA Astrophysics Data System (ADS)

    Cai, Han-Jie; Zhang, Zhi-Lei; Fu, Fen; Li, Jian-Yang; Zhang, Xun-Chao; Zhang, Ya-Ling; Yan, Xue-Song; Lin, Ping; Xv, Jian-Ya; Yang, Lei

    2018-02-01

    The dense granular flow spallation target is a new target concept chosen for the Accelerator-Driven Subcritical (ADS) project in China. For the R&D of this kind of target concept, a dedicated Monte Carlo (MC) program named GMT was developed to perform the simulation study of the beam-target interaction. Owing to the complexities of the target geometry, the computational cost of the MC simulation of particle tracks is highly expensive. Thus, improvement of computational efficiency will be essential for the detailed MC simulation studies of the dense granular target. Here we present the special design of the GMT program and its high efficiency performance. In addition, the speedup potential of the GPU-accelerated spallation models is discussed.

  1. Towards a Highly Efficient Meshfree Simulation of Non-Newtonian Free Surface Ice Flow: Application to the Haut Glacier d'Arolla

    NASA Astrophysics Data System (ADS)

    Shcherbakov, V.; Ahlkrona, J.

    2016-12-01

    In this work we develop a highly efficient meshfree approach to ice sheet modeling. Traditionally mesh based methods such as finite element methods are employed to simulate glacier and ice sheet dynamics. These methods are mature and well developed. However, despite of numerous advantages these methods suffer from some drawbacks such as necessity to remesh the computational domain every time it changes its shape, which significantly complicates the implementation on moving domains, or a costly assembly procedure for nonlinear problems. We introduce a novel meshfree approach that frees us from all these issues. The approach is built upon a radial basis function (RBF) method that, thanks to its meshfree nature, allows for an efficient handling of moving margins and free ice surface. RBF methods are also accurate and easy to implement. Since the formulation is stated in strong form it allows for a substantial reduction of the computational cost associated with the linear system assembly inside the nonlinear solver. We implement a global RBF method that defines an approximation on the entire computational domain. This method exhibits high accuracy properties. However, it suffers from a disadvantage that the coefficient matrix is dense, and therefore the computational efficiency decreases. In order to overcome this issue we also implement a localized RBF method that rests upon a partition of unity approach to subdivide the domain into several smaller subdomains. The radial basis function partition of unity method (RBF-PUM) inherits high approximation characteristics form the global RBF method while resulting in a sparse system of equations, which essentially increases the computational efficiency. To demonstrate the usefulness of the RBF methods we model the velocity field of ice flow in the Haut Glacier d'Arolla. We assume that the flow is governed by the nonlinear Blatter-Pattyn equations. We test the methods for different basal conditions and for a free moving surface. Both RBF methods are compared with a classical finite element method in terms of accuracy and efficiency. We find that the RBF methods are more efficient than the finite element method and well suited for ice dynamics modeling, especially the partition of unity approach.

  2. Future computing platforms for science in a power constrained era

    DOE PAGES

    Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...

    2015-12-23

    Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less

  3. A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations

    NASA Technical Reports Server (NTRS)

    Dydson, Roger W.; Goodrich, John W.

    2000-01-01

    Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.

  4. DoD High Performance Computing Modernization Program Users Group Conference (HPCMP UGC 2011) Held in Portland, Oregon on June 20-23, 2011

    DTIC Science & Technology

    2011-06-01

    4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the

  5. Acceleration of FDTD mode solver by high-performance computing techniques.

    PubMed

    Han, Lin; Xi, Yanping; Huang, Wei-Ping

    2010-06-21

    A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.

  6. Development of a Compact Eleven Feed Cryostat for the Patriot 12-m Antenna System

    NASA Technical Reports Server (NTRS)

    Beaudoin, Christopher; Kildal, Per-Simon; Yang, Jian; Pantaleev, Miroslav

    2010-01-01

    The Eleven antenna has constant beam width, constant phase center location, and low spillover over a decade bandwidth. Therefore, it can feed a reflector for high aperture efficiency (also called feed efficiency). It is equally important that the feed efficiency and its subefficiencies not be degraded significantly by installing the feed in a cryostat. The MIT Haystack Observatory, with guidance from Onsala Space Observatory and Chalmers University, has been working to integrate the Eleven antenna into a compact cryostat suitable for the Patriot 12-m antenna. Since the analysis of the feed efficiencies in this presentation is purely computational, we first demonstrate the validity of the computed results by comparing them to measurements. Subsequently, we analyze the dependence of the cryostat size on the feed efficiencies, and, lastly, the Patriot 12-m subreflector is incorporated into the computational model to assess the overall broadband efficiency of the antenna system.

  7. Determination of efficiency of an aged HPGe detector for gaseous sources by self absorption correction and point source methods

    NASA Astrophysics Data System (ADS)

    Sarangapani, R.; Jose, M. T.; Srinivasan, T. K.; Venkatraman, B.

    2017-07-01

    Methods for the determination of efficiency of an aged high purity germanium (HPGe) detector for gaseous sources have been presented in the paper. X-ray radiography of the detector has been performed to get detector dimensions for computational purposes. The dead layer thickness of HPGe detector has been ascertained from experiments and Monte Carlo computations. Experimental work with standard point and liquid sources in several cylindrical geometries has been undertaken for obtaining energy dependant efficiency. Monte Carlo simulations have been performed for computing efficiencies for point, liquid and gaseous sources. Self absorption correction factors have been obtained using mathematical equations for volume sources and MCNP simulations. Self-absorption correction and point source methods have been used to estimate the efficiency for gaseous sources. The efficiencies determined from the present work have been used to estimate activity of cover gas sample of a fast reactor.

  8. Highly Efficient Computation of the Basal kon using Direct Simulation of Protein-Protein Association with Flexible Molecular Models.

    PubMed

    Saglam, Ali S; Chong, Lillian T

    2016-01-14

    An essential baseline for determining the extent to which electrostatic interactions enhance the kinetics of protein-protein association is the "basal" kon, which is the rate constant for association in the absence of electrostatic interactions. However, since such association events are beyond the milliseconds time scale, it has not been practical to compute the basal kon by directly simulating the association with flexible models. Here, we computed the basal kon for barnase and barstar, two of the most rapidly associating proteins, using highly efficient, flexible molecular simulations. These simulations involved (a) pseudoatomic protein models that reproduce the molecular shapes, electrostatic, and diffusion properties of all-atom models, and (b) application of the weighted ensemble path sampling strategy, which enhanced the efficiency of generating association events by >130-fold. We also examined the extent to which the computed basal kon is affected by inclusion of intermolecular hydrodynamic interactions in the simulations.

  9. Efficient scatter model for simulation of ultrasound images from computed tomography data

    NASA Astrophysics Data System (ADS)

    D'Amato, J. P.; Lo Vercio, L.; Rubi, P.; Fernandez Vera, E.; Barbuzza, R.; Del Fresno, M.; Larrabide, I.

    2015-12-01

    Background and motivation: Real-time ultrasound simulation refers to the process of computationally creating fully synthetic ultrasound images instantly. Due to the high value of specialized low cost training for healthcare professionals, there is a growing interest in the use of this technology and the development of high fidelity systems that simulate the acquisitions of echographic images. The objective is to create an efficient and reproducible simulator that can run either on notebooks or desktops using low cost devices. Materials and methods: We present an interactive ultrasound simulator based on CT data. This simulator is based on ray-casting and provides real-time interaction capabilities. The simulation of scattering that is coherent with the transducer position in real time is also introduced. Such noise is produced using a simplified model of multiplicative noise and convolution with point spread functions (PSF) tailored for this purpose. Results: The computational efficiency of scattering maps generation was revised with an improved performance. This allowed a more efficient simulation of coherent scattering in the synthetic echographic images while providing highly realistic result. We describe some quality and performance metrics to validate these results, where a performance of up to 55fps was achieved. Conclusion: The proposed technique for real-time scattering modeling provides realistic yet computationally efficient scatter distributions. The error between the original image and the simulated scattering image was compared for the proposed method and the state-of-the-art, showing negligible differences in its distribution.

  10. Reconfigurable logic in nanosecond Cu/GeTe/TiN filamentary memristors for energy-efficient in-memory computing.

    PubMed

    Jin, Miaomiao; Cheng, Long; Li, Yi; Hu, Siyu; Lu, Ke; Chen, Jia; Duan, Nian; Wang, Zhuorui; Zhou, Yaxiong; Chang, Ting-Chang; Miao, Xiangshui

    2018-06-27

    Owing to the capability of integrating the information storage and computing in the same physical location, in-memory computing with memristors has become a research hotspot as a promising route for non von Neumann architecture. However, it is still a challenge to develop high performance devices as well as optimized logic methodologies to realize energy-efficient computing. Herein, filamentary Cu/GeTe/TiN memristor is reported to show satisfactory properties with nanosecond switching speed (< 60 ns), low voltage operation (< 2 V), high endurance (>104 cycles) and good retention (>104 s @85℃). It is revealed that the charge carrier conduction mechanisms in high resistance and low resistance states are Schottky emission and hopping transport between the adjacent Cu clusters, respectively, based on the analysis of current-voltage behaviors and resistance-temperature characteristics. An intuitive picture is given to describe the dynamic processes of resistive switching. Moreover, based on the basic material implication (IMP) logic circuit, we proposed a reconfigurable logic method and experimentally implemented IMP, NOT, OR, and COPY logic functions. Design of a one-bit full adder with reduction in computational sequences and its validation in simulation further demonstrate the potential practical application. The results provide important progress towards understanding of resistive switching mechanism and realization of energy-efficient in-memory computing architecture. © 2018 IOP Publishing Ltd.

  11. Smart integrated microsystems: the energy efficiency challenge (Conference Presentation) (Plenary Presentation)

    NASA Astrophysics Data System (ADS)

    Benini, Luca

    2017-06-01

    The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.

  12. Low-power, transparent optical network interface for high bandwidth off-chip interconnects.

    PubMed

    Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren

    2009-04-13

    The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.

  13. A Generalized Method for the Comparable and Rigorous Calculation of the Polytropic Efficiencies of Turbocompressors

    NASA Astrophysics Data System (ADS)

    Dimitrakopoulos, Panagiotis

    2018-03-01

    The calculation of polytropic efficiencies is a very important task, especially during the development of new compression units, like compressor impellers, stages and stage groups. Such calculations are also crucial for the determination of the performance of a whole compressor. As processors and computational capacities have substantially been improved in the last years, the need for a new, rigorous, robust, accurate and at the same time standardized method merged, regarding the computation of the polytropic efficiencies, especially based on thermodynamics of real gases. The proposed method is based on the rigorous definition of the polytropic efficiency. The input consists of pressure and temperature values at the end points of the compression path (suction and discharge), for a given working fluid. The average relative error for the studied cases was 0.536 %. Thus, this high-accuracy method is proposed for efficiency calculations related with turbocompressors and their compression units, especially when they are operating at high power levels, for example in jet engines and high-power plants.

  14. Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

    2005-01-01

    The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.

  15. High Performance, Dependable Multiprocessor

    NASA Technical Reports Server (NTRS)

    Ramos, Jeremy; Samson, John R.; Troxel, Ian; Subramaniyan, Rajagopal; Jacobs, Adam; Greco, James; Cieslewski, Grzegorz; Curreri, John; Fischer, Michael; Grobelny, Eric; hide

    2006-01-01

    With the ever increasing demand for higher bandwidth and processing capacity of today's space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing is now a critical need. In response to this need, NASA's New Millennium Program office has commissioned the development of Dependable Multiprocessor (DM) technology for use in payload and robotic missions. The Dependable Multiprocessor technology is a COTS-based, power efficient, high performance, highly dependable, fault tolerant cluster computer. To date, Honeywell has successfully demonstrated a TRL4 prototype of the Dependable Multiprocessor [I], and is now working on the development of a TRLS prototype. For the present effort Honeywell has teamed up with the University of Florida's High-performance Computing and Simulation (HCS) Lab, and together the team has demonstrated major elements of the Dependable Multiprocessor TRLS system.

  16. Life-cycle costs of high-performance cells

    NASA Technical Reports Server (NTRS)

    Daniel, R.; Burger, D.; Reiter, L.

    1985-01-01

    A life cycle cost analysis of high efficiency cells was presented. Although high efficiency cells produce more power, they also cost more to make and are more susceptible to array hot-spot heating. Three different computer analysis programs were used: SAMICS (solar array manufacturing industry costing standards), PVARRAY (an array failure mode/degradation simulator), and LCP (lifetime cost and performance). The high efficiency cell modules were found to be more economical in this study, but parallel redundancy is recommended.

  17. Center for Efficient Exascale Discretizations Software Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kolev, Tzanio; Dobrev, Veselin; Tomov, Vladimir

    The CEED Software suite is a collection of generally applicable software tools focusing on the following computational motives: PDE discretizations on unstructured meshes, high-order finite element and spectral element methods and unstructured adaptive mesh refinement. All of this software is being developed as part of CEED, a co-design Center for Efficient Exascale Discretizations, within DOE's Exascale Computing Project (ECP) program.

  18. CORDIC-based digital signal processing (DSP) element for adaptive signal processing

    NASA Astrophysics Data System (ADS)

    Bolstad, Gregory D.; Neeld, Kenneth B.

    1995-04-01

    The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.

  19. Experimental quantum computing without entanglement.

    PubMed

    Lanyon, B P; Barbieri, M; Almeida, M P; White, A G

    2008-11-14

    Deterministic quantum computation with one pure qubit (DQC1) is an efficient model of computation that uses highly mixed states. Unlike pure-state models, its power is not derived from the generation of a large amount of entanglement. Instead it has been proposed that other nonclassical correlations are responsible for the computational speedup, and that these can be captured by the quantum discord. In this Letter we implement DQC1 in an all-optical architecture, and experimentally observe the generated correlations. We find no entanglement, but large amounts of quantum discord-except in three cases where an efficient classical simulation is always possible. Our results show that even fully separable, highly mixed, states can contain intrinsically quantum mechanical correlations and that these could offer a valuable resource for quantum information technologies.

  20. Changing computing paradigms towards power efficiency

    PubMed Central

    Klavík, Pavel; Malossi, A. Cristiano I.; Bekas, Costas; Curioni, Alessandro

    2014-01-01

    Power awareness is fast becoming immensely important in computing, ranging from the traditional high-performance computing applications to the new generation of data centric workloads. In this work, we describe our efforts towards a power-efficient computing paradigm that combines low- and high-precision arithmetic. We showcase our ideas for the widely used kernel of solving systems of linear equations that finds numerous applications in scientific and engineering disciplines as well as in large-scale data analytics, statistics and machine learning. Towards this goal, we developed tools for the seamless power profiling of applications at a fine-grain level. In addition, we verify here previous work on post-FLOPS/W metrics and show that these can shed much more light in the power/energy profile of important applications. PMID:24842033

  1. Biocellion: accelerating computer simulation of multicellular biological system models

    PubMed Central

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-01-01

    Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572

  2. High accurate interpolation of NURBS tool path for CNC machine tools

    NASA Astrophysics Data System (ADS)

    Liu, Qiang; Liu, Huan; Yuan, Songmei

    2016-09-01

    Feedrate fluctuation caused by approximation errors of interpolation methods has great effects on machining quality in NURBS interpolation, but few methods can efficiently eliminate or reduce it to a satisfying level without sacrificing the computing efficiency at present. In order to solve this problem, a high accurate interpolation method for NURBS tool path is proposed. The proposed method can efficiently reduce the feedrate fluctuation by forming a quartic equation with respect to the curve parameter increment, which can be efficiently solved by analytic methods in real-time. Theoretically, the proposed method can totally eliminate the feedrate fluctuation for any 2nd degree NURBS curves and can interpolate 3rd degree NURBS curves with minimal feedrate fluctuation. Moreover, a smooth feedrate planning algorithm is also proposed to generate smooth tool motion with considering multiple constraints and scheduling errors by an efficient planning strategy. Experiments are conducted to verify the feasibility and applicability of the proposed method. This research presents a novel NURBS interpolation method with not only high accuracy but also satisfying computing efficiency.

  3. Fundamental Aeronautics Program: Overview of Project Work in Supersonic Cruise Efficiency

    NASA Technical Reports Server (NTRS)

    Castner, Raymond

    2011-01-01

    The Supersonics Project, part of NASA?s Fundamental Aeronautics Program, contains a number of technical challenge areas which include sonic boom community response, airport noise, high altitude emissions, cruise efficiency, light weight durable engines/airframes, and integrated multi-discipline system design. This presentation provides an overview of the current (2011) activities in the supersonic cruise efficiency technical challenge, and is focused specifically on propulsion technologies. The intent is to develop and validate high-performance supersonic inlet and nozzle technologies. Additional work is planned for design and analysis tools for highly-integrated low-noise, low-boom applications. If successful, the payoffs include improved technologies and tools for optimized propulsion systems, propulsion technologies for a minimized sonic boom signature, and a balanced approach to meeting efficiency and community noise goals. In this propulsion area, the work is divided into advanced supersonic inlet concepts, advanced supersonic nozzle concepts, low fidelity computational tool development, high fidelity computational tools, and improved sensors and measurement capability. The current work in each area is summarized.

  4. Fundamental Aeronautics Program: Overview of Propulsion Work in the Supersonic Cruise Efficiency Technical Challenge

    NASA Technical Reports Server (NTRS)

    Castner, Ray

    2012-01-01

    The Supersonics Project, part of NASA's Fundamental Aeronautics Program, contains a number of technical challenge areas which include sonic boom community response, airport noise, high altitude emissions, cruise efficiency, light weight durable engines/airframes, and integrated multi-discipline system design. This presentation provides an overview of the current (2012) activities in the supersonic cruise efficiency technical challenge, and is focused specifically on propulsion technologies. The intent is to develop and validate high-performance supersonic inlet and nozzle technologies. Additional work is planned for design and analysis tools for highly-integrated low-noise, low-boom applications. If successful, the payoffs include improved technologies and tools for optimized propulsion systems, propulsion technologies for a minimized sonic boom signature, and a balanced approach to meeting efficiency and community noise goals. In this propulsion area, the work is divided into advanced supersonic inlet concepts, advanced supersonic nozzle concepts, low fidelity computational tool development, high fidelity computational tools, and improved sensors and measurement capability. The current work in each area is summarized.

  5. Speeding Up Ecological and Evolutionary Computations in R; Essentials of High Performance Computing for Biologists

    PubMed Central

    Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke

    2015-01-01

    Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842

  6. An efficient hybrid technique in RCS predictions of complex targets at high frequencies

    NASA Astrophysics Data System (ADS)

    Algar, María-Jesús; Lozano, Lorena; Moreno, Javier; González, Iván; Cátedra, Felipe

    2017-09-01

    Most computer codes in Radar Cross Section (RCS) prediction use Physical Optics (PO) and Physical theory of Diffraction (PTD) combined with Geometrical Optics (GO) and Geometrical Theory of Diffraction (GTD). The latter approaches are computationally cheaper and much more accurate for curved surfaces, but not applicable for the computation of the RCS of all surfaces of a complex object due to the presence of caustic problems in the analysis of concave surfaces or flat surfaces in the far field. The main contribution of this paper is the development of a hybrid method based on a new combination of two asymptotic techniques: GTD and PO, considering the advantages and avoiding the disadvantages of each of them. A very efficient and accurate method to analyze the RCS of complex structures at high frequencies is obtained with the new combination. The proposed new method has been validated comparing RCS results obtained for some simple cases using the proposed approach and RCS using the rigorous technique of Method of Moments (MoM). Some complex cases have been examined at high frequencies contrasting the results with PO. This study shows the accuracy and the efficiency of the hybrid method and its suitability for the computation of the RCS at really large and complex targets at high frequencies.

  7. An efficient implementation of a high-order filter for a cubed-sphere spectral element model

    NASA Astrophysics Data System (ADS)

    Kang, Hyun-Gyu; Cheong, Hyeong-Bin

    2017-03-01

    A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.

  8. Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

    PubMed

    Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo

    2017-01-01

    The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.

  9. Analytical prediction with multidimensional computer programs and experimental verification of the performance, at a variety of operating conditions, of two traveling wave tubes with depressed collectors

    NASA Technical Reports Server (NTRS)

    Dayton, J. A., Jr.; Kosmahl, H. G.; Ramins, P.; Stankiewicz, N.

    1979-01-01

    Experimental and analytical results are compared for two high performance, octave bandwidth TWT's that use depressed collectors (MDC's) to improve the efficiency. The computations were carried out with advanced, multidimensional computer programs that are described here in detail. These programs model the electron beam as a series of either disks or rings of charge and follow their multidimensional trajectories from the RF input of the ideal TWT, through the slow wave structure, through the magnetic refocusing system, to their points of impact in the depressed collector. Traveling wave tube performance, collector efficiency, and collector current distribution were computed and the results compared with measurements for a number of TWT-MDC systems. Power conservation and correct accounting of TWT and collector losses were observed. For the TWT's operating at saturation, very good agreement was obtained between the computed and measured collector efficiencies. For a TWT operating 3 and 6 dB below saturation, excellent agreement between computed and measured collector efficiencies was obtained in some cases but only fair agreement in others. However, deviations can largely be explained by small differences in the computed and actual spent beam energy distributions. The analytical tools used here appear to be sufficiently refined to design efficient collectors for this class of TWT. However, for maximum efficiency, some experimental optimization (e.g., collector voltages and aperture sizes) will most likely be required.

  10. Energy 101: Energy Efficient Data Centers

    ScienceCinema

    None

    2018-04-16

    Data centers provide mission-critical computing functions vital to the daily operation of top U.S. economic, scientific, and technological organizations. These data centers consume large amounts of energy to run and maintain their computer systems, servers, and associated high-performance components—up to 3% of all U.S. electricity powers data centers. And as more information comes online, data centers will consume even more energy. Data centers can become more energy efficient by incorporating features like power-saving "stand-by" modes, energy monitoring software, and efficient cooling systems instead of energy-intensive air conditioners. These and other efficiency improvements to data centers can produce significant energy savings, reduce the load on the electric grid, and help protect the nation by increasing the reliability of critical computer operations.

  11. Application of microarray analysis on computer cluster and cloud platforms.

    PubMed

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  12. Accurate Monotonicity - Preserving Schemes With Runge-Kutta Time Stepping

    NASA Technical Reports Server (NTRS)

    Suresh, A.; Huynh, H. T.

    1997-01-01

    A new class of high-order monotonicity-preserving schemes for the numerical solution of conservation laws is presented. The interface value in these schemes is obtained by limiting a higher-order polynominal reconstruction. The limiting is designed to preserve accuracy near extrema and to work well with Runge-Kutta time stepping. Computational efficiency is enhanced by a simple test that determines whether the limiting procedure is needed. For linear advection in one dimension, these schemes are shown as well as the Euler equations also confirm their high accuracy, good shock resolution, and computational efficiency.

  13. Multi-GPU hybrid programming accelerated three-dimensional phase-field model in binary alloy

    NASA Astrophysics Data System (ADS)

    Zhu, Changsheng; Liu, Jieqiong; Zhu, Mingfang; Feng, Li

    2018-03-01

    In the process of dendritic growth simulation, the computational efficiency and the problem scales have extremely important influence on simulation efficiency of three-dimensional phase-field model. Thus, seeking for high performance calculation method to improve the computational efficiency and to expand the problem scales has a great significance to the research of microstructure of the material. A high performance calculation method based on MPI+CUDA hybrid programming model is introduced. Multi-GPU is used to implement quantitative numerical simulations of three-dimensional phase-field model in binary alloy under the condition of multi-physical processes coupling. The acceleration effect of different GPU nodes on different calculation scales is explored. On the foundation of multi-GPU calculation model that has been introduced, two optimization schemes, Non-blocking communication optimization and overlap of MPI and GPU computing optimization, are proposed. The results of two optimization schemes and basic multi-GPU model are compared. The calculation results show that the use of multi-GPU calculation model can improve the computational efficiency of three-dimensional phase-field obviously, which is 13 times to single GPU, and the problem scales have been expanded to 8193. The feasibility of two optimization schemes is shown, and the overlap of MPI and GPU computing optimization has better performance, which is 1.7 times to basic multi-GPU model, when 21 GPUs are used.

  14. A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

    NASA Astrophysics Data System (ADS)

    Lin, Y.; O'Malley, D.; Vesselinov, V. V.

    2015-12-01

    Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.

  15. ClimateSpark: An in-memory distributed computing framework for big climate data analytics

    NASA Astrophysics Data System (ADS)

    Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

    2018-06-01

    The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.

  16. Efficient computation of the joint sample frequency spectra for multiple populations.

    PubMed

    Kamm, John A; Terhorst, Jonathan; Song, Yun S

    2017-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

  17. Efficient computation of the joint sample frequency spectra for multiple populations

    PubMed Central

    Kamm, John A.; Terhorst, Jonathan; Song, Yun S.

    2016-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity. PMID:28239248

  18. Efficient Unsteady Flow Visualization with High-Order Access Dependencies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jiang; Guo, Hanqi; Yuan, Xiaoru

    We present a novel high-order access dependencies based model for efficient pathline computation in unsteady flow visualization. By taking longer access sequences into account to model more sophisticated data access patterns in particle tracing, our method greatly improves the accuracy and reliability in data access prediction. In our work, high-order access dependencies are calculated by tracing uniformly-seeded pathlines in both forward and backward directions in a preprocessing stage. The effectiveness of our proposed approach is demonstrated through a parallel particle tracing framework with high-order data prefetching. Results show that our method achieves higher data locality and hence improves the efficiencymore » of pathline computation.« less

  19. Changing computing paradigms towards power efficiency.

    PubMed

    Klavík, Pavel; Malossi, A Cristiano I; Bekas, Costas; Curioni, Alessandro

    2014-06-28

    Power awareness is fast becoming immensely important in computing, ranging from the traditional high-performance computing applications to the new generation of data centric workloads. In this work, we describe our efforts towards a power-efficient computing paradigm that combines low- and high-precision arithmetic. We showcase our ideas for the widely used kernel of solving systems of linear equations that finds numerous applications in scientific and engineering disciplines as well as in large-scale data analytics, statistics and machine learning. Towards this goal, we developed tools for the seamless power profiling of applications at a fine-grain level. In addition, we verify here previous work on post-FLOPS/W metrics and show that these can shed much more light in the power/energy profile of important applications. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  20. Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data

    PubMed Central

    Serang, Oliver; MacCoss, Michael J.; Noble, William Stafford

    2010-01-01

    The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors. PMID:20712337

  1. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  2. The DoD's High Performance Computing Modernization Program - Ensuing the National Earth Systems Prediction Capability Becomes Operational

    NASA Astrophysics Data System (ADS)

    Burnett, W.

    2016-12-01

    The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.

  3. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  4. DNS of Flow in a Low-Pressure Turbine Cascade Using a Discontinuous-Galerkin Spectral-Element Method

    NASA Technical Reports Server (NTRS)

    Garai, Anirban; Diosady, Laslo Tibor; Murman, Scott; Madavan, Nateri

    2015-01-01

    A new computational capability under development for accurate and efficient high-fidelity direct numerical simulation (DNS) and large eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy-stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy and is implemented in a computationally efficient manner on a modern high performance computer architecture. A validation study using this method to perform DNS of flow in a low-pressure turbine airfoil cascade are presented. Preliminary results indicate that the method captures the main features of the flow. Discrepancies between the predicted results and the experiments are likely due to the effects of freestream turbulence not being included in the simulation and will be addressed in the final paper.

  5. Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean.

    PubMed

    Yiu, Sean; Tom, Brian Dm

    2017-01-01

    Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice, the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus, non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence, maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.

  6. Biocellion: accelerating computer simulation of multicellular biological system models.

    PubMed

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-11-01

    Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing

    ERIC Educational Resources Information Center

    Loh, Matthew

    2013-01-01

    The scalability of CMOS technology has driven computation into a diverse range of applications across the power consumption, performance and size spectra. Communication is a necessary adjunct to computation, and whether this is to push data from node-to-node in a high-performance computing cluster or from the receiver of wireless link to a neural…

  8. Linear static structural and vibration analysis on high-performance computers

    NASA Technical Reports Server (NTRS)

    Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

    1993-01-01

    Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.

  9. Parallelism in Manipulator Dynamics. Revision.

    DTIC Science & Technology

    1983-12-01

    computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the... computations , and presents two "mathematically exact" formulationsespecially suited to high-speed, highly parallel implementa- tions using special-purpose...YNAMICS by I(IIAR) IIAROLI) LATIROP .4ISTRACT This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair

  10. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

    2016-07-01

    Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.

  11. Desktop supercomputer: what can it do?

    NASA Astrophysics Data System (ADS)

    Bogdanov, A.; Degtyarev, A.; Korkhov, V.

    2017-12-01

    The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.

  12. Convolutional networks for fast, energy-efficient neuromorphic computing

    PubMed Central

    Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.

    2016-01-01

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489

  13. Convolutional networks for fast, energy-efficient neuromorphic computing.

    PubMed

    Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S

    2016-10-11

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

  14. Towards future high performance computing: What will change? How can we be efficient?

    NASA Astrophysics Data System (ADS)

    Düben, Peter

    2017-04-01

    How can we make the most out of "exascale" supercomputers that will be available soon and enable us to calculate an amazing number of 1,000,000,000,000,000,000 real numbers operations within a single second? How do we need to design applications to use these machines efficiently? What are the limits? We will discuss opportunities and limits of the use of future high performance computers from the perspective of Earth System Modelling. We will provide an overview about future challenges and outline how numerical application will need to be changed to run efficiently on supercomputers in the future. We will also discuss how different disciplines can support each other and talk about data handling and numerical precision of data.

  15. High-order computational fluid dynamics tools for aircraft design

    PubMed Central

    Wang, Z. J.

    2014-01-01

    Most forecasts predict an annual airline traffic growth rate between 4.5 and 5% in the foreseeable future. To sustain that growth, the environmental impact of aircraft cannot be ignored. Future aircraft must have much better fuel economy, dramatically less greenhouse gas emissions and noise, in addition to better performance. Many technical breakthroughs must take place to achieve the aggressive environmental goals set up by governments in North America and Europe. One of these breakthroughs will be physics-based, highly accurate and efficient computational fluid dynamics and aeroacoustics tools capable of predicting complex flows over the entire flight envelope and through an aircraft engine, and computing aircraft noise. Some of these flows are dominated by unsteady vortices of disparate scales, often highly turbulent, and they call for higher-order methods. As these tools will be integral components of a multi-disciplinary optimization environment, they must be efficient to impact design. Ultimately, the accuracy, efficiency, robustness, scalability and geometric flexibility will determine which methods will be adopted in the design process. This article explores these aspects and identifies pacing items. PMID:25024419

  16. Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

    USDA-ARS?s Scientific Manuscript database

    With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...

  17. Quantum Computers

    DTIC Science & Technology

    2010-03-04

    and their sensitivity to charge and flux fluctuations. The first type of superconducting qubit , the charge qubit , omits the inductance . There is no...nanostructured NbN superconducting nanowire detectors have achieved high efficiency and photon number resolution16,17. One approach to a high-efficiency single...resemble classical high- speed integrated circuits and can be readily fabricated using existing technologies. The basic physics behind superconducting qubits

  18. Turbocharged molecular discovery of OLED emitters: from high-throughput quantum simulation to highly efficient TADF devices

    NASA Astrophysics Data System (ADS)

    Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Ha, Dong-Gwang; Einzinger, Markus; Wu, Tony; Baldo, Marc A.; Aspuru-Guzik, Alán.

    2016-09-01

    Discovering new OLED emitters requires many experiments to synthesize candidates and test performance in devices. Large scale computer simulation can greatly speed this search process but the problem remains challenging enough that brute force application of massive computing power is not enough to successfully identify novel structures. We report a successful High Throughput Virtual Screening study that leveraged a range of methods to optimize the search process. The generation of candidate structures was constrained to contain combinatorial explosion. Simulations were tuned to the specific problem and calibrated with experimental results. Experimentalists and theorists actively collaborated such that experimental feedback was regularly utilized to update and shape the computational search. Supervised machine learning methods prioritized candidate structures prior to quantum chemistry simulation to prevent wasting compute on likely poor performers. With this combination of techniques, each multiplying the strength of the search, this effort managed to navigate an area of molecular space and identify hundreds of promising OLED candidate structures. An experimentally validated selection of this set shows emitters with external quantum efficiencies as high as 22%.

  19. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.

  20. Efficiently modeling neural networks on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Farber, Robert M.

    1993-01-01

    Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.

  1. A Computationally Efficient Method for Polyphonic Pitch Estimation

    NASA Astrophysics Data System (ADS)

    Zhou, Ruohua; Reiss, Joshua D.; Mattavelli, Marco; Zoia, Giorgio

    2009-12-01

    This paper presents a computationally efficient method for polyphonic pitch estimation. The method employs the Fast Resonator Time-Frequency Image (RTFI) as the basic time-frequency analysis tool. The approach is composed of two main stages. First, a preliminary pitch estimation is obtained by means of a simple peak-picking procedure in the pitch energy spectrum. Such spectrum is calculated from the original RTFI energy spectrum according to harmonic grouping principles. Then the incorrect estimations are removed according to spectral irregularity and knowledge of the harmonic structures of the music notes played on commonly used music instruments. The new approach is compared with a variety of other frame-based polyphonic pitch estimation methods, and results demonstrate the high performance and computational efficiency of the approach.

  2. Information processing using a single dynamical node as complex system

    PubMed Central

    Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.

    2011-01-01

    Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110

  3. Discontinuous Galerkin Methods and High-Speed Turbulent Flows

    NASA Astrophysics Data System (ADS)

    Atak, Muhammed; Larsson, Johan; Munz, Claus-Dieter

    2014-11-01

    Discontinuous Galerkin methods gain increasing importance within the CFD community as they combine arbitrary high order of accuracy in complex geometries with parallel efficiency. Particularly the discontinuous Galerkin spectral element method (DGSEM) is a promising candidate for both the direct numerical simulation (DNS) and large eddy simulation (LES) of turbulent flows due to its excellent scaling attributes. In this talk, we present a DNS of a compressible turbulent boundary layer along a flat plate at a free-stream Mach number of M = 2.67 and assess the computational efficiency of the DGSEM at performing high-fidelity simulations of both transitional and turbulent boundary layers. We compare the accuracy of the results as well as the computational performance to results using a high order finite difference method.

  4. Transformational electronics: a powerful way to revolutionize our information world

    NASA Astrophysics Data System (ADS)

    Rojas, Jhonathan P.; Torres Sevilla, Galo A.; Ghoneim, Mohamed T.; Hussain, Aftab M.; Ahmed, Sally M.; Nassar, Joanna M.; Bahabry, Rabab R.; Nour, Maha; Kutbee, Arwa T.; Byas, Ernesto; Al-Saif, Bidoor; Alamri, Amal M.; Hussain, Muhammad M.

    2014-06-01

    With the emergence of cloud computation, we are facing the rising waves of big data. It is our time to leverage such opportunity by increasing data usage both by man and machine. We need ultra-mobile computation with high data processing speed, ultra-large memory, energy efficiency and multi-functionality. Additionally, we have to deploy energy-efficient multi-functional 3D ICs for robust cyber-physical system establishment. To achieve such lofty goals we have to mimic human brain, which is inarguably the world's most powerful and energy efficient computer. Brain's cortex has folded architecture to increase surface area in an ultra-compact space to contain its neuron and synapses. Therefore, it is imperative to overcome two integration challenges: (i) finding out a low-cost 3D IC fabrication process and (ii) foldable substrates creation with ultra-large-scale-integration of high performance energy efficient electronics. Hence, we show a low-cost generic batch process based on trench-protect-peel-recycle to fabricate rigid and flexible 3D ICs as well as high performance flexible electronics. As of today we have made every single component to make a fully flexible computer including non-planar state-of-the-art FinFETs. Additionally we have demonstrated various solid-state memory, movable MEMS devices, energy harvesting and storage components. To show the versatility of our process, we have extended our process towards other inorganic semiconductor substrates such as silicon germanium and III-V materials. Finally, we report first ever fully flexible programmable silicon based microprocessor towards foldable brain computation and wirelessly programmable stretchable and flexible thermal patch for pain management for smart bionics.

  5. Multi-petascale highly efficient parallel supercomputer

    DOEpatents

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  6. Coherent Optical Memory with High Storage Efficiency and Large Fractional Delay

    NASA Astrophysics Data System (ADS)

    Chen, Yi-Hsin; Lee, Meng-Jung; Wang, I.-Chung; Du, Shengwang; Chen, Yong-Fan; Chen, Ying-Cheng; Yu, Ite A.

    2013-02-01

    A high-storage efficiency and long-lived quantum memory for photons is an essential component in long-distance quantum communication and optical quantum computation. Here, we report a 78% storage efficiency of light pulses in a cold atomic medium based on the effect of electromagnetically induced transparency. At 50% storage efficiency, we obtain a fractional delay of 74, which is the best up-to-date record. The classical fidelity of the recalled pulse is better than 90% and nearly independent of the storage time, as confirmed by the direct measurement of phase evolution of the output light pulse with a beat-note interferometer. Such excellent phase coherence between the stored and recalled light pulses suggests that the current result may be readily applied to single photon wave packets. Our work significantly advances the technology of electromagnetically induced transparency-based optical memory and may find practical applications in long-distance quantum communication and optical quantum computation.

  7. Coherent optical memory with high storage efficiency and large fractional delay.

    PubMed

    Chen, Yi-Hsin; Lee, Meng-Jung; Wang, I-Chung; Du, Shengwang; Chen, Yong-Fan; Chen, Ying-Cheng; Yu, Ite A

    2013-02-22

    A high-storage efficiency and long-lived quantum memory for photons is an essential component in long-distance quantum communication and optical quantum computation. Here, we report a 78% storage efficiency of light pulses in a cold atomic medium based on the effect of electromagnetically induced transparency. At 50% storage efficiency, we obtain a fractional delay of 74, which is the best up-to-date record. The classical fidelity of the recalled pulse is better than 90% and nearly independent of the storage time, as confirmed by the direct measurement of phase evolution of the output light pulse with a beat-note interferometer. Such excellent phase coherence between the stored and recalled light pulses suggests that the current result may be readily applied to single photon wave packets. Our work significantly advances the technology of electromagnetically induced transparency-based optical memory and may find practical applications in long-distance quantum communication and optical quantum computation.

  8. Increasing the sampling efficiency of protein conformational transition using velocity-scaling optimized hybrid explicit/implicit solvent REMD simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Yuqi; Wang, Jinan; Shao, Qiang, E-mail: qshao@mail.shcnc.ac.cn, E-mail: Jiye.Shi@ucb.com, E-mail: wlzhu@mail.shcnc.ac.cn

    2015-03-28

    The application of temperature replica exchange molecular dynamics (REMD) simulation on protein motion is limited by its huge requirement of computational resource, particularly when explicit solvent model is implemented. In the previous study, we developed a velocity-scaling optimized hybrid explicit/implicit solvent REMD method with the hope to reduce the temperature (replica) number on the premise of maintaining high sampling efficiency. In this study, we utilized this method to characterize and energetically identify the conformational transition pathway of a protein model, the N-terminal domain of calmodulin. In comparison to the standard explicit solvent REMD simulation, the hybrid REMD is much lessmore » computationally expensive but, meanwhile, gives accurate evaluation of the structural and thermodynamic properties of the conformational transition which are in well agreement with the standard REMD simulation. Therefore, the hybrid REMD could highly increase the computational efficiency and thus expand the application of REMD simulation to larger-size protein systems.« less

  9. Analog synthetic biology.

    PubMed

    Sarpeshkar, R

    2014-03-28

    We analyse the pros and cons of analog versus digital computation in living cells. Our analysis is based on fundamental laws of noise in gene and protein expression, which set limits on the energy, time, space, molecular count and part-count resources needed to compute at a given level of precision. We conclude that analog computation is significantly more efficient in its use of resources than deterministic digital computation even at relatively high levels of precision in the cell. Based on this analysis, we conclude that synthetic biology must use analog, collective analog, probabilistic and hybrid analog-digital computational approaches; otherwise, even relatively simple synthetic computations in cells such as addition will exceed energy and molecular-count budgets. We present schematics for efficiently representing analog DNA-protein computation in cells. Analog electronic flow in subthreshold transistors and analog molecular flux in chemical reactions obey Boltzmann exponential laws of thermodynamics and are described by astoundingly similar logarithmic electrochemical potentials. Therefore, cytomorphic circuits can help to map circuit designs between electronic and biochemical domains. We review recent work that uses positive-feedback linearization circuits to architect wide-dynamic-range logarithmic analog computation in Escherichia coli using three transcription factors, nearly two orders of magnitude more efficient in parts than prior digital implementations.

  10. Analog synthetic biology

    PubMed Central

    Sarpeshkar, R.

    2014-01-01

    We analyse the pros and cons of analog versus digital computation in living cells. Our analysis is based on fundamental laws of noise in gene and protein expression, which set limits on the energy, time, space, molecular count and part-count resources needed to compute at a given level of precision. We conclude that analog computation is significantly more efficient in its use of resources than deterministic digital computation even at relatively high levels of precision in the cell. Based on this analysis, we conclude that synthetic biology must use analog, collective analog, probabilistic and hybrid analog–digital computational approaches; otherwise, even relatively simple synthetic computations in cells such as addition will exceed energy and molecular-count budgets. We present schematics for efficiently representing analog DNA–protein computation in cells. Analog electronic flow in subthreshold transistors and analog molecular flux in chemical reactions obey Boltzmann exponential laws of thermodynamics and are described by astoundingly similar logarithmic electrochemical potentials. Therefore, cytomorphic circuits can help to map circuit designs between electronic and biochemical domains. We review recent work that uses positive-feedback linearization circuits to architect wide-dynamic-range logarithmic analog computation in Escherichia coli using three transcription factors, nearly two orders of magnitude more efficient in parts than prior digital implementations. PMID:24567476

  11. Facilitating higher-fidelity simulations of axial compressor instability and other turbomachinery flow conditions

    NASA Astrophysics Data System (ADS)

    Herrick, Gregory Paul

    The quest to accurately capture flow phenomena with length-scales both short and long and to accurately represent complex flow phenomena within disparately sized geometry inspires a need for an efficient, high-fidelity, multi-block structured computational fluid dynamics (CFD) parallel computational scheme. This research presents and demonstrates a more efficient computational method by which to perform multi-block structured CFD parallel computational simulations, thus facilitating higher-fidelity solutions of complicated geometries (due to the inclusion of grids for "small'' flow areas which are often merely modeled) and their associated flows. This computational framework offers greater flexibility and user-control in allocating the resource balance between process count and wall-clock computation time. The principal modifications implemented in this revision consist of a "multiple grid block per processing core'' software infrastructure and an analytic computation of viscous flux Jacobians. The development of this scheme is largely motivated by the desire to simulate axial compressor stall inception with more complete gridding of the flow passages (including rotor tip clearance regions) than has been previously done while maintaining high computational efficiency (i.e., minimal consumption of computational resources), and thus this paradigm shall be demonstrated with an examination of instability in a transonic axial compressor. However, the paradigm presented herein facilitates CFD simulation of myriad previously impractical geometries and flows and is not limited to detailed analyses of axial compressor flows. While the simulations presented herein were technically possible under the previous structure of the subject software, they were much less computationally efficient and thus not pragmatically feasible; the previous research using this software to perform three-dimensional, full-annulus, time-accurate, unsteady, full-stage (with sliding-interface) simulations of rotating stall inception in axial compressors utilized tip clearance periodic models, while the scheme here is demonstrated by a simulation of axial compressor stall inception utilizing gridded rotor tip clearance regions. As will be discussed, much previous research---experimental, theoretical, and computational---has suggested that understanding clearance flow behavior is critical to understanding stall inception, and previous computational research efforts which have used tip clearance models have begged the question, "What about the clearance flows?''. This research begins to address that question.

  12. High-efficiency multiphoton boson sampling

    NASA Astrophysics Data System (ADS)

    Wang, Hui; He, Yu; Li, Yu-Huai; Su, Zu-En; Li, Bo; Huang, He-Liang; Ding, Xing; Chen, Ming-Cheng; Liu, Chang; Qin, Jian; Li, Jin-Peng; He, Yu-Ming; Schneider, Christian; Kamp, Martin; Peng, Cheng-Zhi; Höfling, Sven; Lu, Chao-Yang; Pan, Jian-Wei

    2017-06-01

    Boson sampling is considered as a strong candidate to demonstrate 'quantum computational supremacy' over classical computers. However, previous proof-of-principle experiments suffered from small photon number and low sampling rates owing to the inefficiencies of the single-photon sources and multiport optical interferometers. Here, we develop two central components for high-performance boson sampling: robust multiphoton interferometers with 99% transmission rate and actively demultiplexed single-photon sources based on a quantum dot-micropillar with simultaneously high efficiency, purity and indistinguishability. We implement and validate three-, four- and five-photon boson sampling, and achieve sampling rates of 4.96 kHz, 151 Hz and 4 Hz, respectively, which are over 24,000 times faster than previous experiments. Our architecture can be scaled up for a larger number of photons and with higher sampling rates to compete with classical computers, and might provide experimental evidence against the extended Church-Turing thesis.

  13. Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xiu, Dongbin

    2017-03-03

    The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.

  14. Decomposed multidimensional control grid interpolation for common consumer electronic image processing applications

    NASA Astrophysics Data System (ADS)

    Zwart, Christine M.; Venkatesan, Ragav; Frakes, David H.

    2012-10-01

    Interpolation is an essential and broadly employed function of signal processing. Accordingly, considerable development has focused on advancing interpolation algorithms toward optimal accuracy. Such development has motivated a clear shift in the state-of-the art from classical interpolation to more intelligent and resourceful approaches, registration-based interpolation for example. As a natural result, many of the most accurate current algorithms are highly complex, specific, and computationally demanding. However, the diverse hardware destinations for interpolation algorithms present unique constraints that often preclude use of the most accurate available options. For example, while computationally demanding interpolators may be suitable for highly equipped image processing platforms (e.g., computer workstations and clusters), only more efficient interpolators may be practical for less well equipped platforms (e.g., smartphones and tablet computers). The latter examples of consumer electronics present a design tradeoff in this regard: high accuracy interpolation benefits the consumer experience but computing capabilities are limited. It follows that interpolators with favorable combinations of accuracy and efficiency are of great practical value to the consumer electronics industry. We address multidimensional interpolation-based image processing problems that are common to consumer electronic devices through a decomposition approach. The multidimensional problems are first broken down into multiple, independent, one-dimensional (1-D) interpolation steps that are then executed with a newly modified registration-based one-dimensional control grid interpolator. The proposed approach, decomposed multidimensional control grid interpolation (DMCGI), combines the accuracy of registration-based interpolation with the simplicity, flexibility, and computational efficiency of a 1-D interpolation framework. Results demonstrate that DMCGI provides improved interpolation accuracy (and other benefits) in image resizing, color sample demosaicing, and video deinterlacing applications, at a computational cost that is manageable or reduced in comparison to popular alternatives.

  15. Fluid mechanics based classification of the respiratory efficiency of several nasal cavities.

    PubMed

    Lintermann, Andreas; Meinke, Matthias; Schröder, Wolfgang

    2013-11-01

    The flow in the human nasal cavity is of great importance to understand rhinologic pathologies like impaired respiration or heating capabilities, a diminished sense of taste and smell, and the presence of dry mucous membranes. To numerically analyze this flow problem a highly efficient and scalable Thermal Lattice-BGK (TLBGK) solver is used, which is very well suited for flows in intricate geometries. The generation of the computational mesh is completely automatic and highly parallelized such that it can be executed efficiently on High Performance Computers (HPCs). An evaluation of the functionality of nasal cavities is based on an analysis of pressure drop, secondary flow structures, wall-shear stress distributions, and temperature variations from the nostrils to the pharynx. The results of the flow fields of three completely different nasal cavities allow their classification into ability groups and support the a priori decision process on surgical interventions. © 2013 Elsevier Ltd. All rights reserved.

  16. Image detection and compression for memory efficient system analysis

    NASA Astrophysics Data System (ADS)

    Bayraktar, Mustafa

    2015-02-01

    The advances in digital signal processing have been progressing towards efficient use of memory and processing. Both of these factors can be utilized efficiently by using feasible techniques of image storage by computing the minimum information of image which will enhance computation in later processes. Scale Invariant Feature Transform (SIFT) can be utilized to estimate and retrieve of an image. In computer vision, SIFT can be implemented to recognize the image by comparing its key features from SIFT saved key point descriptors. The main advantage of SIFT is that it doesn't only remove the redundant information from an image but also reduces the key points by matching their orientation and adding them together in different windows of image [1]. Another key property of this approach is that it works on highly contrasted images more efficiently because it`s design is based on collecting key points from the contrast shades of image.

  17. High-Performance Computing Data Center | Computational Science | NREL

    Science.gov Websites

    liquid cooling to achieve its very low PUE, then captures and reuses waste heat as the primary heating dry cooler that uses refrigerant in a passive cycle to dissipate heat-is reducing onsite water Measuring efficiency through PUE Warm-water liquid cooling Re-using waste heat from computing components

  18. Beyond the Computer: Reading as a Process of Intellectual Development.

    ERIC Educational Resources Information Center

    Thompson, Mark E.

    With more than 100,000 computers in public schools across the United States, the impact of computer assisted instruction (CAI) on students' reading behavior needs to be evaluated. In reading laboratories, CAI has been found to provide an efficient and highly motivating means of teaching specific educational objectives. Yet, while computer…

  19. High-Speed Photonic Reservoir Computing Using a Time-Delay-Based Architecture: Million Words per Second Classification

    NASA Astrophysics Data System (ADS)

    Larger, Laurent; Baylón-Fuentes, Antonio; Martinenghi, Romain; Udaltsov, Vladimir S.; Chembo, Yanne K.; Jacquot, Maxime

    2017-01-01

    Reservoir computing, originally referred to as an echo state network or a liquid state machine, is a brain-inspired paradigm for processing temporal information. It involves learning a "read-out" interpretation for nonlinear transients developed by high-dimensional dynamics when the latter is excited by the information signal to be processed. This novel computational paradigm is derived from recurrent neural network and machine learning techniques. It has recently been implemented in photonic hardware for a dynamical system, which opens the path to ultrafast brain-inspired computing. We report on a novel implementation involving an electro-optic phase-delay dynamics designed with off-the-shelf optoelectronic telecom devices, thus providing the targeted wide bandwidth. Computational efficiency is demonstrated experimentally with speech-recognition tasks. State-of-the-art speed performances reach one million words per second, with very low word error rate. Additionally, to record speed processing, our investigations have revealed computing-efficiency improvements through yet-unexplored temporal-information-processing techniques, such as simultaneous multisample injection and pitched sampling at the read-out compared to information "write-in".

  20. An efficient method to compute spurious end point contributions in PO solutions. [Physical Optics

    NASA Technical Reports Server (NTRS)

    Gupta, Inder J.; Burnside, Walter D.; Pistorius, Carl W. I.

    1987-01-01

    A method is given to compute the spurious endpoint contributions in the physical optics solution for electromagnetic scattering from conducting bodies. The method is applicable to general three-dimensional structures. The only information required to use the method is the radius of curvature of the body at the shadow boundary. Thus, the method is very efficient for numerical computations. As an illustration, the method is applied to several bodies of revolution to compute the endpoint contributions for backscattering in the case of axial incidence. It is shown that in high-frequency situations, the endpoint contributions obtained using the method are equal to the true endpoint contributions.

  1. Parallel computation using boundary elements in solid mechanics

    NASA Technical Reports Server (NTRS)

    Chien, L. S.; Sun, C. T.

    1990-01-01

    The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.

  2. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    NASA Astrophysics Data System (ADS)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.

  3. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    NASA Astrophysics Data System (ADS)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  4. How to Compute a Slot Marker - Calculation of Controller Managed Spacing Tools for Efficient Descents with Precision Scheduling

    NASA Technical Reports Server (NTRS)

    Prevot, Thomas

    2012-01-01

    This paper describes the underlying principles and algorithms for computing the primary controller managed spacing (CMS) tools developed at NASA for precisely spacing aircraft along efficient descent paths. The trajectory-based CMS tools include slot markers, delay indications and speed advisories. These tools are one of three core NASA technologies integrated in NASAs ATM technology demonstration-1 (ATD-1) that will operationally demonstrate the feasibility of fuel-efficient, high throughput arrival operations using Automatic Dependent Surveillance Broadcast (ADS-B) and ground-based and airborne NASA technologies for precision scheduling and spacing.

  5. An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

    NASA Astrophysics Data System (ADS)

    Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng

    2018-04-01

    In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.

  6. SAGE: The Self-Adaptive Grid Code. 3

    NASA Technical Reports Server (NTRS)

    Davies, Carol B.; Venkatapathy, Ethiraj

    1999-01-01

    The multi-dimensional self-adaptive grid code, SAGE, is an important tool in the field of computational fluid dynamics (CFD). It provides an efficient method to improve the accuracy of flow solutions while simultaneously reducing computer processing time. Briefly, SAGE enhances an initial computational grid by redistributing the mesh points into more appropriate locations. The movement of these points is driven by an equal-error-distribution algorithm that utilizes the relationship between high flow gradients and excessive solution errors. The method also provides a balance between clustering points in the high gradient regions and maintaining the smoothness and continuity of the adapted grid, The latest version, Version 3, includes the ability to change the boundaries of a given grid to more efficiently enclose flow structures and provides alternative redistribution algorithms.

  7. Effects of Computer-Assisted Instruction on Performance of Senior High School Biology Students in Ghana

    ERIC Educational Resources Information Center

    Owusu, K. A.; Monney, K. A.; Appiah, J. Y.; Wilmot, E. M.

    2010-01-01

    This study investigated the comparative efficiency of computer-assisted instruction (CAI) and conventional teaching method in biology on senior high school students. A science class was selected in each of two randomly selected schools. The pretest-posttest non equivalent quasi experimental design was used. The students in the experimental group…

  8. NREL's Building-Integrated Supercomputer Provides Heating and Efficient Computing (Fact Sheet)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    2014-09-01

    NREL's Energy Systems Integration Facility (ESIF) is meant to investigate new ways to integrate energy sources so they work together efficiently, and one of the key tools to that investigation, a new supercomputer, is itself a prime example of energy systems integration. NREL teamed with Hewlett-Packard (HP) and Intel to develop the innovative warm-water, liquid-cooled Peregrine supercomputer, which not only operates efficiently but also serves as the primary source of building heat for ESIF offices and laboratories. This innovative high-performance computer (HPC) can perform more than a quadrillion calculations per second as part of the world's most energy-efficient HPC datamore » center.« less

  9. A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S

    2014-01-01

    Recent years have witnessed a phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to dramatic increase in their power consumption. This paper surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works which compare energy efficiency of GPUs with other computing systems, e.g. FPGAs and CPUs. The aim of this survey is to provide researchers with knowledge of state-of-the-art in GPU power management and motivate them to architectmore » highly energy-efficient GPUs of tomorrow.« less

  10. Pattern-based integer sample motion search strategies in the context of HEVC

    NASA Astrophysics Data System (ADS)

    Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas

    2015-09-01

    The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.

  11. Efficient high-order structure-preserving methods for the generalized Rosenau-type equation with power law nonlinearity

    NASA Astrophysics Data System (ADS)

    Cai, Jiaxiang; Liang, Hua; Zhang, Chun

    2018-06-01

    Based on the multi-symplectic Hamiltonian formula of the generalized Rosenau-type equation, a multi-symplectic scheme and an energy-preserving scheme are proposed. To improve the accuracy of the solution, we apply the composition technique to the obtained schemes to develop high-order schemes which are also multi-symplectic and energy-preserving respectively. Discrete fast Fourier transform makes a significant improvement to the computational efficiency of schemes. Numerical results verify that all the proposed schemes have satisfactory performance in providing accurate solution and preserving the discrete mass and energy invariants. Numerical results also show that although each basic time step is divided into several composition steps, the computational efficiency of the composition schemes is much higher than that of the non-composite schemes.

  12. HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

    NASA Technical Reports Server (NTRS)

    Sterling, Thomas; Bergman, Larry

    2000-01-01

    Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.

  13. A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing.

    PubMed

    Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang

    2017-07-24

    With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient.

  14. DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.

    PubMed

    Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei

    2017-07-18

    Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.

  15. A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

    NASA Astrophysics Data System (ADS)

    Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

    2016-09-01

    Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.

  16. Beam efflux measurements

    NASA Technical Reports Server (NTRS)

    Komatsu, G. K.; Stellen, J. M., Jr.

    1976-01-01

    Measurements have been made of the high energy thrust ions, (Group I), high angle/high energy ions (Group II), and high angle/low energy ions (Group IV) of a mercury electron bombardment thruster in the angular divergence range from 0 deg to greater than 90 deg. The measurements have been made as a function of thrust ion current, propellant utilization efficiency, bombardment discharge voltage, screen and accelerator grid potential (accel-decel ratio) and neutralizer keeper potential. The shape of the Group IV (charge exchange) ion plume has remained essentially fixed within the range of variation of the engine operation parameters. The magnitude of the charge exchange ion flux scales with thrust ion current, for good propellant utilization conditions. For fixed thrust ion current, charge exchange ion flux increases for diminishing propellant utilization efficiency. Facility effects influence experimental accuracies within the range of propellant utilization efficiency used in the experiments. The flux of high angle/high energy Group II ions is significantly diminished by the use of minimum decel voltages on the accelerator grid. A computer model of charge exchange ion production and motion has been developed. The program allows computation of charge exchange ion volume production rate, total production rate, and charge exchange ion trajectories for "genuine" and "facilities effects" particles. In the computed flux deposition patterns, the Group I and Group IV ion plumes exhibit a counter motion.

  17. High Power Klystrons for Efficient Reliable High Power Amplifiers.

    DTIC Science & Technology

    1980-11-01

    techniques to obtain high overall efficiency. One is second harmonic space charge bunching. This is a process whereby the fundamental and second harmonic...components of the space charge waves in the electron beam of a microwave tube are combined to produce more highly concentrated electron bunches raising the...the drift lengths to enhance the 2nd harmonic component in the space charge waves. The latter method was utilized in the VKC-7790. Computer

  18. Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

    DOE PAGES

    McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai; ...

    2017-11-07

    Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less

  19. Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai

    Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less

  20. Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo.

    PubMed

    McDaniel, T; D'Azevedo, E F; Li, Y W; Wong, K; Kent, P R C

    2017-11-07

    Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.

  1. Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

    NASA Astrophysics Data System (ADS)

    McDaniel, T.; D'Azevedo, E. F.; Li, Y. W.; Wong, K.; Kent, P. R. C.

    2017-11-01

    Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.

  2. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads.

    PubMed

    Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus

    2016-05-01

    Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.

  3. A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems

    PubMed Central

    Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda

    2018-01-01

    In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems. PMID:29439442

  4. A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems.

    PubMed

    Ma, Xingpo; Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda

    2018-02-10

    In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data are processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems.

  5. A highly efficient multi-core algorithm for clustering extremely large datasets

    PubMed Central

    2010-01-01

    Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922

  6. Online System for Faster Multipoint Linkage Analysis via Parallel Execution on Thousands of Personal Computers

    PubMed Central

    Silberstein, M.; Tzemach, A.; Dovgolevsky, N.; Fishelson, M.; Schuster, A.; Geiger, D.

    2006-01-01

    Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We present a distributed system called “SUPERLINK-ONLINE,” for the computation of multipoint LOD scores of large inbred pedigrees. It achieves high performance via the efficient parallelization of the algorithms in SUPERLINK, a state-of-the-art serial program for these tasks, and through the use of the idle cycles of thousands of personal computers. The main algorithmic challenge has been to efficiently split a large task for distributed execution in a highly dynamic, nondedicated running environment. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either the installation of software or the maintenance of a complicated distributed environment. As the system was being developed, it was extensively tested by collaborating medical centers worldwide on a variety of real data sets, some of which are presented in this article. PMID:16685644

  7. High-Performance Computing Data Center Water Usage Efficiency |

    Science.gov Websites

    cooler-an advanced dry cooler that uses refrigerant in a passive cycle to dissipate heat-was installed at efficiency-using wet cooling when it's hot and dry cooling when it's not. Learn more about NREL's partnership

  8. Semi-autonomous parking for enhanced safety and efficiency.

    DOT National Transportation Integrated Search

    2017-06-01

    This project focuses on the use of tools from a combination of computer vision and localization based navigation schemes to aid the process of efficient and safe parking of vehicles in high density parking spaces. The principles of collision avoidanc...

  9. Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

    NASA Astrophysics Data System (ADS)

    Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

    2016-09-01

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.

  10. Parallel computing method for simulating hydrological processesof large rivers under climate change

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, Y.

    2016-12-01

    Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.

  11. Computer aided drug discovery of highly ligand efficient, low molecular weight imidazopyridine analogs as FLT3 inhibitors

    PubMed Central

    Frett, Brendan; McConnell, Nick; Smith, Catherine C.; Wang, Yuanxiang; Shah, Neil P.; Li, Hong-yu

    2015-01-01

    The FLT3 kinase represents an attractive target to effectively treat AML. Unfortunately, no FLT3 targeted therapeutic is currently approved. In line with our continued interests in treating kinase related disease for anti-FLT3 mutant activity, we utilized pioneering synthetic methodology in combination with computer aided drug discovery and identified low molecular weight, highly ligand efficient, FLT3 kinase inhibitors. Compounds were analyzed for biochemical inhibition, their ability to selectively inhibit cell proliferation, for FLT3 mutant activity, and preliminary aqueous solubility. Validated hits were discovered that can serve as starting platforms for lead candidates. PMID:25765758

  12. Computationally efficient method for Fourier transform of highly chirped pulses for laser and parametric amplifier modeling.

    PubMed

    Andrianov, Alexey; Szabo, Aron; Sergeev, Alexander; Kim, Arkady; Chvykov, Vladimir; Kalashnikov, Mikhail

    2016-11-14

    We developed an improved approach to calculate the Fourier transform of signals with arbitrary large quadratic phase which can be efficiently implemented in numerical simulations utilizing Fast Fourier transform. The proposed algorithm significantly reduces the computational cost of Fourier transform of a highly chirped and stretched pulse by splitting it into two separate transforms of almost transform limited pulses, thereby reducing the required grid size roughly by a factor of the pulse stretching. The application of our improved Fourier transform algorithm in the split-step method for numerical modeling of CPA and OPCPA shows excellent agreement with standard algorithms.

  13. Efficient architecture for spike sorting in reconfigurable hardware.

    PubMed

    Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying

    2013-11-01

    This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.

  14. High-efficiency wavefunction updates for large scale Quantum Monte Carlo

    NASA Astrophysics Data System (ADS)

    Kent, Paul; McDaniel, Tyler; Li, Ying Wai; D'Azevedo, Ed

    Within ab intio Quantum Monte Carlo (QMC) simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunctions. The evaluation of each Monte Carlo move requires finding the determinant of a dense matrix, which is traditionally iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. For calculations with thousands of electrons, this operation dominates the execution profile. We propose a novel rank- k delayed update scheme. This strategy enables probability evaluation for multiple successive Monte Carlo moves, with application of accepted moves to the matrices delayed until after a predetermined number of moves, k. Accepted events grouped in this manner are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency. This procedure does not change the underlying Monte Carlo sampling or the sampling efficiency. For large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude speedups can be obtained on both multi-core CPU and on GPUs, making this algorithm highly advantageous for current petascale and future exascale computations.

  15. Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schmalz, Mark S

    2011-07-24

    Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less

  16. Towards energy-efficient photonic interconnects

    NASA Astrophysics Data System (ADS)

    Demir, Yigit; Hardavellas, Nikos

    2015-03-01

    Silicon photonics have emerged as a promising solution to meet the growing demand for high-bandwidth, low-latency, and energy-efficient on-chip and off-chip communication in many-core processors. However, current silicon-photonic interconnect designs for many-core processors waste a significant amount of power because (a) lasers are always on, even during periods of interconnect inactivity, and (b) microring resonators employ heaters which consume a significant amount of power just to overcome thermal variations and maintain communication on the photonic links, especially in a 3D-stacked design. The problem of high laser power consumption is particularly important as lasers typically have very low energy efficiency, and photonic interconnects often remain underutilized both in scientific computing (compute-intensive execution phases underutilize the interconnect), and in server computing (servers in Google-scale datacenters have a typical utilization of less than 30%). We address the high laser power consumption by proposing EcoLaser+, which is a laser control scheme that saves energy by predicting the interconnect activity and opportunistically turning the on-chip laser off when possible, and also by scaling the width of the communication link based on a runtime prediction of the expected message length. Our laser control scheme can save up to 62 - 92% of the laser energy, and improve the energy efficiency of a manycore processor with negligible performance penalty. We address the high trimming (heating) power consumption of the microrings by proposing insulation methods that reduce the impact of localized heating induced by highly-active components on the 3D-stacked logic die.

  17. An integrated system for land resources supervision based on the IoT and cloud computing

    NASA Astrophysics Data System (ADS)

    Fang, Shifeng; Zhu, Yunqiang; Xu, Lida; Zhang, Jinqu; Zhou, Peiji; Luo, Kan; Yang, Jie

    2017-01-01

    Integrated information systems are important safeguards for the utilisation and development of land resources. Information technologies, including the Internet of Things (IoT) and cloud computing, are inevitable requirements for the quality and efficiency of land resources supervision tasks. In this study, an economical and highly efficient supervision system for land resources has been established based on IoT and cloud computing technologies; a novel online and offline integrated system with synchronised internal and field data that includes the entire process of 'discovering breaches, analysing problems, verifying fieldwork and investigating cases' was constructed. The system integrates key technologies, such as the automatic extraction of high-precision information based on remote sensing, semantic ontology-based technology to excavate and discriminate public sentiment on the Internet that is related to illegal incidents, high-performance parallel computing based on MapReduce, uniform storing and compressing (bitwise) technology, global positioning system data communication and data synchronisation mode, intelligent recognition and four-level ('device, transfer, system and data') safety control technology. The integrated system based on a 'One Map' platform has been officially implemented by the Department of Land and Resources of Guizhou Province, China, and was found to significantly increase the efficiency and level of land resources supervision. The system promoted the overall development of informatisation in fields related to land resource management.

  18. Joint the Center for Applied Scientific Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gamblin, Todd; Bremer, Timo; Van Essen, Brian

    The Center for Applied Scientific Computing serves as Livermore Lab’s window to the broader computer science, computational physics, applied mathematics, and data science research communities. In collaboration with academic, industrial, and other government laboratory partners, we conduct world-class scientific research and development on problems critical to national security. CASC applies the power of high-performance computing and the efficiency of modern computational methods to the realms of stockpile stewardship, cyber and energy security, and knowledge discovery for intelligence applications.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less

  20. Computational fluid dynamics research

    NASA Technical Reports Server (NTRS)

    Chandra, Suresh; Jones, Kenneth; Hassan, Hassan; Mcrae, David Scott

    1992-01-01

    The focus of research in the computational fluid dynamics (CFD) area is two fold: (1) to develop new approaches for turbulence modeling so that high speed compressible flows can be studied for applications to entry and re-entry flows; and (2) to perform research to improve CFD algorithm accuracy and efficiency for high speed flows. Research activities, faculty and student participation, publications, and financial information are outlined.

  1. Message Passing vs. Shared Address Space on a Cluster of SMPs

    NASA Technical Reports Server (NTRS)

    Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

    2000-01-01

    The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.

  2. Multi-Objective Aerodynamic Optimization of the Streamlined Shape of High-Speed Trains Based on the Kriging Model.

    PubMed

    Xu, Gang; Liang, Xifeng; Yao, Shuanbao; Chen, Dawei; Li, Zhiwei

    2017-01-01

    Minimizing the aerodynamic drag and the lift of the train coach remains a key issue for high-speed trains. With the development of computing technology and computational fluid dynamics (CFD) in the engineering field, CFD has been successfully applied to the design process of high-speed trains. However, developing a new streamlined shape for high-speed trains with excellent aerodynamic performance requires huge computational costs. Furthermore, relationships between multiple design variables and the aerodynamic loads are seldom obtained. In the present study, the Kriging surrogate model is used to perform a multi-objective optimization of the streamlined shape of high-speed trains, where the drag and the lift of the train coach are the optimization objectives. To improve the prediction accuracy of the Kriging model, the cross-validation method is used to construct the optimal Kriging model. The optimization results show that the two objectives are efficiently optimized, indicating that the optimization strategy used in the present study can greatly improve the optimization efficiency and meet the engineering requirements.

  3. Utilizing fast multipole expansions for efficient and accurate quantum-classical molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Schwörer, Magnus; Lorenzen, Konstantin; Mathias, Gerald; Tavan, Paul

    2015-03-01

    Recently, a novel approach to hybrid quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations has been suggested [Schwörer et al., J. Chem. Phys. 138, 244103 (2013)]. Here, the forces acting on the atoms are calculated by grid-based density functional theory (DFT) for a solute molecule and by a polarizable molecular mechanics (PMM) force field for a large solvent environment composed of several 103-105 molecules as negative gradients of a DFT/PMM hybrid Hamiltonian. The electrostatic interactions are efficiently described by a hierarchical fast multipole method (FMM). Adopting recent progress of this FMM technique [Lorenzen et al., J. Chem. Theory Comput. 10, 3244 (2014)], which particularly entails a strictly linear scaling of the computational effort with the system size, and adapting this revised FMM approach to the computation of the interactions between the DFT and PMM fragments of a simulation system, here, we show how one can further enhance the efficiency and accuracy of such DFT/PMM-MD simulations. The resulting gain of total performance, as measured for alanine dipeptide (DFT) embedded in water (PMM) by the product of the gains in efficiency and accuracy, amounts to about one order of magnitude. We also demonstrate that the jointly parallelized implementation of the DFT and PMM-MD parts of the computation enables the efficient use of high-performance computing systems. The associated software is available online.

  4. Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

    PubMed

    Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

    2014-09-01

    Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. On modelling three-dimensional piezoelectric smart structures with boundary spectral element method

    NASA Astrophysics Data System (ADS)

    Zou, Fangxin; Aliabadi, M. H.

    2017-05-01

    The computational efficiency of the boundary element method in elastodynamic analysis can be significantly improved by employing high-order spectral elements for boundary discretisation. In this work, for the first time, the so-called boundary spectral element method is utilised to formulate the piezoelectric smart structures that are widely used in structural health monitoring (SHM) applications. The resultant boundary spectral element formulation has been validated by the finite element method (FEM) and physical experiments. The new formulation has demonstrated a lower demand on computational resources and a higher numerical stability than commercial FEM packages. Comparing to the conventional boundary element formulation, a significant reduction in computational expenses has been achieved. In summary, the boundary spectral element formulation presented in this paper provides a highly efficient and stable mathematical tool for the development of SHM applications.

  6. On finite element implementation and computational techniques for constitutive modeling of high temperature composites

    NASA Technical Reports Server (NTRS)

    Saleeb, A. F.; Chang, T. Y. P.; Wilt, T.; Iskovitz, I.

    1989-01-01

    The research work performed during the past year on finite element implementation and computational techniques pertaining to high temperature composites is outlined. In the present research, two main issues are addressed: efficient geometric modeling of composite structures and expedient numerical integration techniques dealing with constitutive rate equations. In the first issue, mixed finite elements for modeling laminated plates and shells were examined in terms of numerical accuracy, locking property and computational efficiency. Element applications include (currently available) linearly elastic analysis and future extension to material nonlinearity for damage predictions and large deformations. On the material level, various integration methods to integrate nonlinear constitutive rate equations for finite element implementation were studied. These include explicit, implicit and automatic subincrementing schemes. In all cases, examples are included to illustrate the numerical characteristics of various methods that were considered.

  7. Re-Form: FPGA-Powered True Codesign Flow for High-Performance Computing In The Post-Moore Era

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cappello, Franck; Yoshii, Kazutomo; Finkel, Hal

    Multicore scaling will end soon because of practical power limits. Dark silicon is becoming a major issue even more than the end of Moore’s law. In the post-Moore era, the energy efficiency of computing will be a major concern. FPGAs could be a key to maximizing the energy efficiency. In this paper we address severe challenges in the adoption of FPGA in HPC and describe “Re-form,” an FPGA-powered codesign flow.

  8. Computational efficient segmentation of cell nuclei in 2D and 3D fluorescent micrographs

    NASA Astrophysics Data System (ADS)

    De Vylder, Jonas; Philips, Wilfried

    2011-02-01

    This paper proposes a new segmentation technique developed for the segmentation of cell nuclei in both 2D and 3D fluorescent micrographs. The proposed method can deal with both blurred edges as with touching nuclei. Using a dual scan line algorithm its both memory as computational efficient, making it interesting for the analysis of images coming from high throughput systems or the analysis of 3D microscopic images. Experiments show good results, i.e. recall of over 0.98.

  9. Efficient Ab initio Modeling of Random Multicomponent Alloys

    DOE PAGES

    Jiang, Chao; Uberuaga, Blas P.

    2016-03-08

    Here, we present in this Letter a novel small set of ordered structures (SSOS) method that allows extremely efficient ab initio modeling of random multi-component alloys. Using inverse II-III spinel oxides and equiatomic quinary bcc (so-called high entropy) alloys as examples, we also demonstrate that a SSOS can achieve the same accuracy as a large supercell or a well-converged cluster expansion, but with significantly reduced computational cost. In particular, because of this efficiency, a large number of quinary alloy compositions can be quickly screened, leading to the identification of several new possible high entropy alloy chemistries. Furthermore, the SSOS methodmore » developed here can be broadly useful for the rapid computational design of multi-component materials, especially those with a large number of alloying elements, a challenging problem for other approaches.« less

  10. Multi-fidelity stochastic collocation method for computation of statistical moments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu

    We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.

  11. On the impact of approximate computation in an analog DeSTIN architecture.

    PubMed

    Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar

    2014-05-01

    Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.

  12. FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

    PubMed

    Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

    2018-06-08

    Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.

  13. GaAs shallow-homojunction solar cells

    NASA Technical Reports Server (NTRS)

    Fan, J. C. C.

    1981-01-01

    The feasibility of fabricating space resistant, high efficiency, light weight, low cost GaAs shallow homojunction solar cells for space application is investigated. The material preparation of ultrathin GaAs single crystal layers, and the fabrication of efficient GaAs solar cells on bulk GaAs substrates are discussed. Considerable progress was made in both areas, and conversion efficiency about 16% AMO was obtained using anodic oxide as a single layer antireflection coating. A computer design shows that even better cells can be obtained with double layer antireflection coating. Ultrathin, high efficiency solar cells were obtained from GaAs films prepared by the CLEFT process, with conversion efficiency as high as 17% at AMI from a 10 micrometers thick GaAs film. A organometallic CVD was designed and constructed.

  14. REVEAL: An Extensible Reduced Order Model Builder for Simulation and Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Agarwal, Khushbu; Sharma, Poorva; Ma, Jinliang

    2013-04-30

    Many science domains need to build computationally efficient and accurate representations of high fidelity, computationally expensive simulations. These computationally efficient versions are known as reduced-order models. This paper presents the design and implementation of a novel reduced-order model (ROM) builder, the REVEAL toolset. This toolset generates ROMs based on science- and engineering-domain specific simulations executed on high performance computing (HPC) platforms. The toolset encompasses a range of sampling and regression methods that can be used to generate a ROM, automatically quantifies the ROM accuracy, and provides support for an iterative approach to improve ROM accuracy. REVEAL is designed to bemore » extensible in order to utilize the core functionality with any simulator that has published input and output formats. It also defines programmatic interfaces to include new sampling and regression techniques so that users can ‘mix and match’ mathematical techniques to best suit the characteristics of their model. In this paper, we describe the architecture of REVEAL and demonstrate its usage with a computational fluid dynamics model used in carbon capture.« less

  15. An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

    NASA Astrophysics Data System (ADS)

    Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton

    2013-10-01

    Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.

  16. Scalable, High-performance 3D Imaging Software Platform: System Architecture and Application to Virtual Colonoscopy

    PubMed Central

    Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin

    2013-01-01

    One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803

  17. PP-SWAT: A phython-based computing software for efficient multiobjective callibration of SWAT

    USDA-ARS?s Scientific Manuscript database

    With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...

  18. Algorithm-Based Fault Tolerance Integrated with Replication

    NASA Technical Reports Server (NTRS)

    Some, Raphael; Rennels, David

    2008-01-01

    In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.

  19. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    PubMed Central

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  20. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    PubMed

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  1. Efficient biprediction decision scheme for fast high efficiency video coding encoding

    NASA Astrophysics Data System (ADS)

    Park, Sang-hyo; Lee, Seung-ho; Jang, Euee S.; Jun, Dongsan; Kang, Jung-Won

    2016-11-01

    An efficient biprediction decision scheme of high efficiency video coding (HEVC) is proposed for fast-encoding applications. For low-delay video applications, bidirectional prediction can be used to increase compression performance efficiently with previous reference frames. However, at the same time, the computational complexity of the HEVC encoder is significantly increased due to the additional biprediction search. Although a some research has attempted to reduce this complexity, whether the prediction is strongly related to both motion complexity and prediction modes in a coding unit has not yet been investigated. A method that avoids most compression-inefficient search points is proposed so that the computational complexity of the motion estimation process can be dramatically decreased. To determine if biprediction is critical, the proposed method exploits the stochastic correlation of the context of prediction units (PUs): the direction of a PU and the accuracy of a motion vector. Through experimental results, the proposed method showed that the time complexity of biprediction can be reduced to 30% on average, outperforming existing methods in view of encoding time, number of function calls, and memory access.

  2. Low-Energy Truly Random Number Generation with Superparamagnetic Tunnel Junctions for Unconventional Computing

    NASA Astrophysics Data System (ADS)

    Vodenicarevic, D.; Locatelli, N.; Mizrahi, A.; Friedman, J. S.; Vincent, A. F.; Romera, M.; Fukushima, A.; Yakushiji, K.; Kubota, H.; Yuasa, S.; Tiwari, S.; Grollier, J.; Querlioz, D.

    2017-11-01

    Low-energy random number generation is critical for many emerging computing schemes proposed to complement or replace von Neumann architectures. However, current random number generators are always associated with an energy cost that is prohibitive for these computing schemes. We introduce random number bit generation based on specific nanodevices: superparamagnetic tunnel junctions. We experimentally demonstrate high-quality random bit generation that represents an orders-of-magnitude improvement in energy efficiency over current solutions. We show that the random generation speed improves with nanodevice scaling, and we investigate the impact of temperature, magnetic field, and cross talk. Finally, we show how alternative computing schemes can be implemented using superparamagentic tunnel junctions as random number generators. These results open the way for fabricating efficient hardware computing devices leveraging stochasticity, and they highlight an alternative use for emerging nanodevices.

  3. Computationally Efficient Power Allocation Algorithm in Multicarrier-Based Cognitive Radio Networks: OFDM and FBMC Systems

    NASA Astrophysics Data System (ADS)

    Shaat, Musbah; Bader, Faouzi

    2010-12-01

    Cognitive Radio (CR) systems have been proposed to increase the spectrum utilization by opportunistically access the unused spectrum. Multicarrier communication systems are promising candidates for CR systems. Due to its high spectral efficiency, filter bank multicarrier (FBMC) can be considered as an alternative to conventional orthogonal frequency division multiplexing (OFDM) for transmission over the CR networks. This paper addresses the problem of resource allocation in multicarrier-based CR networks. The objective is to maximize the downlink capacity of the network under both total power and interference introduced to the primary users (PUs) constraints. The optimal solution has high computational complexity which makes it unsuitable for practical applications and hence a low complexity suboptimal solution is proposed. The proposed algorithm utilizes the spectrum holes in PUs bands as well as active PU bands. The performance of the proposed algorithm is investigated for OFDM and FBMC based CR systems. Simulation results illustrate that the proposed resource allocation algorithm with low computational complexity achieves near optimal performance and proves the efficiency of using FBMC in CR context.

  4. The future challenge for aeropropulsion

    NASA Technical Reports Server (NTRS)

    Rosen, Robert; Bowditch, David N.

    1992-01-01

    NASA's research in aeropropulsion is focused on improving the efficiency, capability, and environmental compatibility for all classes of future aircraft. The development of innovative concepts, and theoretical, experimental, and computational tools provide the knowledge base for continued propulsion system advances. Key enabling technologies include advances in internal fluid mechanics, structures, light-weight high-strength composite materials, and advanced sensors and controls. Recent emphasis has been on the development of advanced computational tools in internal fluid mechanics, structural mechanics, reacting flows, and computational chemistry. For subsonic transport applications, very high bypass ratio turbofans with increased engine pressure ratio are being investigated to increase fuel efficiency and reduce airport noise levels. In a joint supersonic cruise propulsion program with industry, the critical environmental concerns of emissions and community noise are being addressed. NASA is also providing key technologies for the National Aerospaceplane, and is studying propulsion systems that provide the capability for aircraft to accelerate to and cruise in the Mach 4-6 speed range. The combination of fundamental, component, and focused technology development underway at NASA will make possible dramatic advances in aeropropulsion efficiency and environmental compatibility for future aeronautical vehicles.

  5. A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

    Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less

  6. A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

    DOE PAGES

    Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

    2016-09-01

    Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less

  7. Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids

    NASA Astrophysics Data System (ADS)

    Ma, Xinrong; Duan, Zhijian

    2018-04-01

    High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.

  8. Towards a Versatile Tele-Education Platform for Computer Science Educators Based on the Greek School Network

    ERIC Educational Resources Information Center

    Paraskevas, Michael; Zarouchas, Thomas; Angelopoulos, Panagiotis; Perikos, Isidoros

    2013-01-01

    Now days the growing need for highly qualified computer science educators in modern educational environments is commonplace. This study examines the potential use of Greek School Network (GSN) to provide a robust and comprehensive e-training course for computer science educators in order to efficiently exploit advanced IT services and establish a…

  9. Micromagnetics on high-performance workstation and mobile computational platforms

    NASA Astrophysics Data System (ADS)

    Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

    2015-05-01

    The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.

  10. Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.

    PubMed

    Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai

    2017-11-01

    For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.

  11. High-Productivity Computing in Computational Physics Education

    NASA Astrophysics Data System (ADS)

    Tel-Zur, Guy

    2011-03-01

    We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.

  12. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads

    PubMed Central

    Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus

    2016-01-01

    Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922

  13. Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning

    PubMed Central

    Chalmers, Eric; Luczak, Artur; Gruber, Aaron J.

    2016-01-01

    The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL) to guide “goal-directed” behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals' ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, “forward sweeps” through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort) required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks. PMID:28018203

  14. LOCAL ORTHOGONAL CUTTING METHOD FOR COMPUTING MEDIAL CURVES AND ITS BIOMEDICAL APPLICATIONS

    PubMed Central

    Einstein, Daniel R.; Dyedov, Vladimir

    2010-01-01

    Medial curves have a wide range of applications in geometric modeling and analysis (such as shape matching) and biomedical engineering (such as morphometry and computer assisted surgery). The computation of medial curves poses significant challenges, both in terms of theoretical analysis and practical efficiency and reliability. In this paper, we propose a definition and analysis of medial curves and also describe an efficient and robust method called local orthogonal cutting (LOC) for computing medial curves. Our approach is based on three key concepts: a local orthogonal decomposition of objects into substructures, a differential geometry concept called the interior center of curvature (ICC), and integrated stability and consistency tests. These concepts lend themselves to robust numerical techniques and result in an algorithm that is efficient and noise resistant. We illustrate the effectiveness and robustness of our approach with some highly complex, large-scale, noisy biomedical geometries derived from medical images, including lung airways and blood vessels. We also present comparisons of our method with some existing methods. PMID:20628546

  15. Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

    NASA Astrophysics Data System (ADS)

    Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

    2011-03-01

    A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.

  16. Inverse targeting —An effective immunization strategy

    NASA Astrophysics Data System (ADS)

    Schneider, C. M.; Mihaljev, T.; Herrmann, H. J.

    2012-05-01

    We propose a new method to immunize populations or computer networks against epidemics which is more efficient than any continuous immunization method considered before. The novelty of our method resides in the way of determining the immunization targets. First we identify those individuals or computers that contribute the least to the disease spreading measured through their contribution to the size of the largest connected cluster in the social or a computer network. The immunization process follows the list of identified individuals or computers in inverse order, immunizing first those which are most relevant for the epidemic spreading. We have applied our immunization strategy to several model networks and two real networks, the Internet and the collaboration network of high-energy physicists. We find that our new immunization strategy is in the case of model networks up to 14%, and for real networks up to 33% more efficient than immunizing dynamically the most connected nodes in a network. Our strategy is also numerically efficient and can therefore be applied to large systems.

  17. Spatial adaption procedures on unstructured meshes for accurate unsteady aerodynamic flow computation

    NASA Technical Reports Server (NTRS)

    Rausch, Russ D.; Batina, John T.; Yang, Henry T. Y.

    1991-01-01

    Spatial adaption procedures for the accurate and efficient solution of steady and unsteady inviscid flow problems are described. The adaption procedures were developed and implemented within a two-dimensional unstructured-grid upwind-type Euler code. These procedures involve mesh enrichment and mesh coarsening to either add points in a high gradient region or the flow or remove points where they are not needed, respectively, to produce solutions of high spatial accuracy at minimal computational costs. A detailed description is given of the enrichment and coarsening procedures and comparisons with alternative results and experimental data are presented to provide an assessment of the accuracy and efficiency of the capability. Steady and unsteady transonic results, obtained using spatial adaption for the NACA 0012 airfoil, are shown to be of high spatial accuracy, primarily in that the shock waves are very sharply captured. The results were obtained with a computational savings of a factor of approximately fifty-three for a steady case and as much as twenty-five for the unsteady cases.

  18. Median Robust Extended Local Binary Pattern for Texture Classification.

    PubMed

    Liu, Li; Lao, Songyang; Fieguth, Paul W; Guo, Yulan; Wang, Xiaogang; Pietikäinen, Matti

    2016-03-01

    Local binary patterns (LBP) are considered among the most computationally efficient high-performance texture features. However, the LBP method is very sensitive to image noise and is unable to capture macrostructure information. To best address these disadvantages, in this paper, we introduce a novel descriptor for texture classification, the median robust extended LBP (MRELBP). Different from the traditional LBP and many LBP variants, MRELBP compares regional image medians rather than raw image intensities. A multiscale LBP type descriptor is computed by efficiently comparing image medians over a novel sampling scheme, which can capture both microstructure and macrostructure texture information. A comprehensive evaluation on benchmark data sets reveals MRELBP's high performance-robust to gray scale variations, rotation changes and noise-but at a low computational cost. MRELBP produces the best classification scores of 99.82%, 99.38%, and 99.77% on three popular Outex test suites. More importantly, MRELBP is shown to be highly robust to image noise, including Gaussian noise, Gaussian blur, salt-and-pepper noise, and random pixel corruption.

  19. Distributed collaborative probabilistic design for turbine blade-tip radial running clearance using support vector machine of regression

    NASA Astrophysics Data System (ADS)

    Fei, Cheng-Wei; Bai, Guang-Chen

    2014-12-01

    To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.

  20. Spatial adaption procedures on unstructured meshes for accurate unsteady aerodynamic flow computation

    NASA Technical Reports Server (NTRS)

    Rausch, Russ D.; Yang, Henry T. Y.; Batina, John T.

    1991-01-01

    Spatial adaption procedures for the accurate and efficient solution of steady and unsteady inviscid flow problems are described. The adaption procedures were developed and implemented within a two-dimensional unstructured-grid upwind-type Euler code. These procedures involve mesh enrichment and mesh coarsening to either add points in high gradient regions of the flow or remove points where they are not needed, respectively, to produce solutions of high spatial accuracy at minimal computational cost. The paper gives a detailed description of the enrichment and coarsening procedures and presents comparisons with alternative results and experimental data to provide an assessment of the accuracy and efficiency of the capability. Steady and unsteady transonic results, obtained using spatial adaption for the NACA 0012 airfoil, are shown to be of high spatial accuracy, primarily in that the shock waves are very sharply captured. The results were obtained with a computational savings of a factor of approximately fifty-three for a steady case and as much as twenty-five for the unsteady cases.

  1. hp-Adaptive time integration based on the BDF for viscous flows

    NASA Astrophysics Data System (ADS)

    Hay, A.; Etienne, S.; Pelletier, D.; Garon, A.

    2015-06-01

    This paper presents a procedure based on the Backward Differentiation Formulas of order 1 to 5 to obtain efficient time integration of the incompressible Navier-Stokes equations. The adaptive algorithm performs both stepsize and order selections to control respectively the solution accuracy and the computational efficiency of the time integration process. The stepsize selection (h-adaptivity) is based on a local error estimate and an error controller to guarantee that the numerical solution accuracy is within a user prescribed tolerance. The order selection (p-adaptivity) relies on the idea that low-accuracy solutions can be computed efficiently by low order time integrators while accurate solutions require high order time integrators to keep computational time low. The selection is based on a stability test that detects growing numerical noise and deems a method of order p stable if there is no method of lower order that delivers the same solution accuracy for a larger stepsize. Hence, it guarantees both that (1) the used method of integration operates inside of its stability region and (2) the time integration procedure is computationally efficient. The proposed time integration procedure also features a time-step rejection and quarantine mechanisms, a modified Newton method with a predictor and dense output techniques to compute solution at off-step points.

  2. Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

    NASA Astrophysics Data System (ADS)

    Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

    2017-10-01

    In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.

  3. Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

    NASA Astrophysics Data System (ADS)

    Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

    2018-04-01

    The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.

  4. Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.

    With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less

  5. Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data

    DOE PAGES

    Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.

    2017-01-01

    With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less

  6. Assessment of computational issues associated with analysis of high-lift systems

    NASA Technical Reports Server (NTRS)

    Balasubramanian, R.; Jones, Kenneth M.; Waggoner, Edgar G.

    1992-01-01

    Thin-layer Navier-Stokes calculations for wing-fuselage configurations from subsonic to hypersonic flow regimes are now possible. However, efficient, accurate solutions for using these codes for two- and three-dimensional high-lift systems have yet to be realized. A brief overview of salient experimental and computational research is presented. An assessment of the state-of-the-art relative to high-lift system analysis and identification of issues related to grid generation and flow physics which are crucial for computational success in this area are also provided. Research in support of the high-lift elements of NASA's High Speed Research and Advanced Subsonic Transport Programs which addresses some of the computational issues is presented. Finally, fruitful areas of concentrated research are identified to accelerate overall progress for high lift system analysis and design.

  7. Mathematical modelling of Bit-Level Architecture using Reciprocal Quantum Logic

    NASA Astrophysics Data System (ADS)

    Narendran, S.; Selvakumar, J.

    2018-04-01

    Efficiency of high-performance computing is on high demand with both speed and energy efficiency. Reciprocal Quantum Logic (RQL) is one of the technology which will produce high speed and zero static power dissipation. RQL uses AC power supply as input rather than DC input. RQL has three set of basic gates. Series of reciprocal transmission lines are placed in between each gate to avoid loss of power and to achieve high speed. Analytical model of Bit-Level Architecture are done through RQL. Major drawback of reciprocal Quantum Logic is area, because of lack in proper power supply. To achieve proper power supply we need to use splitters which will occupy large area. Distributed arithmetic uses vector- vector multiplication one is constant and other is signed variable and each word performs as a binary number, they rearranged and mixed to form distributed system. Distributed arithmetic is widely used in convolution and high performance computational devices.

  8. Scientific Visualization in High Speed Network Environments

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kutler, Paul (Technical Monitor)

    1997-01-01

    In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.

  9. A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Hodgson, M.; Li, W.

    2016-12-01

    Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.

  10. U.S. EPA computational toxicology programs: Central role of chemical-annotation efforts and molecular databases

    EPA Science Inventory

    EPA’s National Center for Computational Toxicology is engaged in high-profile research efforts to improve the ability to more efficiently and effectively prioritize and screen thousands of environmental chemicals for potential toxicity. A central component of these efforts invol...

  11. Multigrid treatment of implicit continuum diffusion

    NASA Astrophysics Data System (ADS)

    Francisquez, Manaure; Zhu, Ben; Rogers, Barrett

    2017-10-01

    Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).

  12. An Isopycnal Box Model with predictive deep-ocean structure for biogeochemical cycling applications

    NASA Astrophysics Data System (ADS)

    Goodwin, Philip

    2012-07-01

    To simulate global ocean biogeochemical tracer budgets a model must accurately determine both the volume and surface origins of each water-mass. Water-mass volumes are dynamically linked to the ocean circulation in General Circulation Models, but at the cost of high computational load. In computationally efficient Box Models the water-mass volumes are simply prescribed and do not vary when the circulation transport rates or water mass densities are perturbed. A new computationally efficient Isopycnal Box Model is presented in which the sub-surface box volumes are internally calculated from the prescribed circulation using a diffusive conceptual model of the thermocline, in which upwelling of cold dense water is balanced by a downward diffusion of heat. The volumes of the sub-surface boxes are set so that the density stratification satisfies an assumed link between diapycnal diffusivity, κd, and buoyancy frequency, N: κd = c/(Nα), where c and α are user prescribed parameters. In contrast to conventional Box Models, the volumes of the sub-surface ocean boxes in the Isopycnal Box Model are dynamically linked to circulation, and automatically respond to circulation perturbations. This dynamical link allows an important facet of ocean biogeochemical cycling to be simulated in a highly computationally efficient model framework.

  13. Progress on a Taylor weak statement finite element algorithm for high-speed aerodynamic flows

    NASA Technical Reports Server (NTRS)

    Baker, A. J.; Freels, J. D.

    1989-01-01

    A new finite element numerical Computational Fluid Dynamics (CFD) algorithm has matured to the point of efficiently solving two-dimensional high speed real-gas compressible flow problems in generalized coordinates on modern vector computer systems. The algorithm employs a Taylor Weak Statement classical Galerkin formulation, a variably implicit Newton iteration, and a tensor matrix product factorization of the linear algebra Jacobian under a generalized coordinate transformation. Allowing for a general two-dimensional conservation law system, the algorithm has been exercised on the Euler and laminar forms of the Navier-Stokes equations. Real-gas fluid properties are admitted, and numerical results verify solution accuracy, efficiency, and stability over a range of test problem parameters.

  14. Computer aided drug discovery of highly ligand efficient, low molecular weight imidazopyridine analogs as FLT3 inhibitors.

    PubMed

    Frett, Brendan; McConnell, Nick; Smith, Catherine C; Wang, Yuanxiang; Shah, Neil P; Li, Hong-yu

    2015-04-13

    The FLT3 kinase represents an attractive target to effectively treat AML. Unfortunately, no FLT3 targeted therapeutic is currently approved. In line with our continued interests in treating kinase related disease for anti-FLT3 mutant activity, we utilized pioneering synthetic methodology in combination with computer aided drug discovery and identified low molecular weight, highly ligand efficient, FLT3 kinase inhibitors. Compounds were analyzed for biochemical inhibition, their ability to selectively inhibit cell proliferation, for FLT3 mutant activity, and preliminary aqueous solubility. Validated hits were discovered that can serve as starting platforms for lead candidates. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  15. A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture

    NASA Astrophysics Data System (ADS)

    Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.

    2017-12-01

    A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.

  16. Complexity control algorithm based on adaptive mode selection for interframe coding in high efficiency video coding

    NASA Astrophysics Data System (ADS)

    Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong

    2017-07-01

    The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.

  17. A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing

    PubMed Central

    Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang

    2017-01-01

    With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient. PMID:28737733

  18. Dynamic displacement measurement of large-scale structures based on the Lucas-Kanade template tracking algorithm

    NASA Astrophysics Data System (ADS)

    Guo, Jie; Zhu, Chang`an

    2016-01-01

    The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.

  19. Optimal design study of high efficiency indium phosphide space solar cells

    NASA Technical Reports Server (NTRS)

    Jain, Raj K.; Flood, Dennis J.

    1990-01-01

    Recently indium phosphide solar cells have achieved beginning of life AMO efficiencies in excess of 19 pct. at 25 C. The high efficiency prospects along with superb radiation tolerance make indium phosphide a leading material for space power requirements. To achieve cost effectiveness, practical cell efficiencies have to be raised to near theoretical limits and thin film indium phosphide cells need to be developed. The optimal design study is described of high efficiency indium phosphide solar cells for space power applications using the PC-1D computer program. It is shown that cells with efficiencies over 22 pct. AMO at 25 C could be fabricated by achieving proper material and process parameters. It is observed that further improvements in cell material and process parameters could lead to experimental cell efficiencies near theoretical limits. The effect of various emitter and base parameters on cell performance was studied.

  20. Precise and fast spatial-frequency analysis using the iterative local Fourier transform.

    PubMed

    Lee, Sukmock; Choi, Heejoo; Kim, Dae Wook

    2016-09-19

    The use of the discrete Fourier transform has decreased since the introduction of the fast Fourier transform (fFT), which is a numerically efficient computing process. This paper presents the iterative local Fourier transform (ilFT), a set of new processing algorithms that iteratively apply the discrete Fourier transform within a local and optimal frequency domain. The new technique achieves 210 times higher frequency resolution than the fFT within a comparable computation time. The method's superb computing efficiency, high resolution, spectrum zoom-in capability, and overall performance are evaluated and compared to other advanced high-resolution Fourier transform techniques, such as the fFT combined with several fitting methods. The effectiveness of the ilFT is demonstrated through the data analysis of a set of Talbot self-images (1280 × 1024 pixels) obtained with an experimental setup using grating in a diverging beam produced by a coherent point source.

  1. Remote sensing image ship target detection method based on visual attention model

    NASA Astrophysics Data System (ADS)

    Sun, Yuejiao; Lei, Wuhu; Ren, Xiaodong

    2017-11-01

    The traditional methods of detecting ship targets in remote sensing images mostly use sliding window to search the whole image comprehensively. However, the target usually occupies only a small fraction of the image. This method has high computational complexity for large format visible image data. The bottom-up selective attention mechanism can selectively allocate computing resources according to visual stimuli, thus improving the computational efficiency and reducing the difficulty of analysis. Considering of that, a method of ship target detection in remote sensing images based on visual attention model was proposed in this paper. The experimental results show that the proposed method can reduce the computational complexity while improving the detection accuracy, and improve the detection efficiency of ship targets in remote sensing images.

  2. Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

    NASA Astrophysics Data System (ADS)

    Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph

    2017-10-01

    In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.

  3. A nonlinear relaxation/quasi-Newton algorithm for the compressible Navier-Stokes equations

    NASA Technical Reports Server (NTRS)

    Edwards, Jack R.; Mcrae, D. S.

    1992-01-01

    A highly efficient implicit method for the computation of steady, two-dimensional compressible Navier-Stokes flowfields is presented. The discretization of the governing equations is hybrid in nature, with flux-vector splitting utilized in the streamwise direction and central differences with flux-limited artificial dissipation used for the transverse fluxes. Line Jacobi relaxation is used to provide a suitable initial guess for a new nonlinear iteration strategy based on line Gauss-Seidel sweeps. The applicability of quasi-Newton methods as convergence accelerators for this and other line relaxation algorithms is discussed, and efficient implementations of such techniques are presented. Convergence histories and comparisons with experimental data are presented for supersonic flow over a flat plate and for several high-speed compression corner interactions. Results indicate a marked improvement in computational efficiency over more conventional upwind relaxation strategies, particularly for flowfields containing large pockets of streamwise subsonic flow.

  4. Accelerated simulation of stochastic particle removal processes in particle-resolved aerosol models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Curtis, J.H.; Michelotti, M.D.; Riemer, N.

    2016-10-01

    Stochastic particle-resolved methods have proven useful for simulating multi-dimensional systems such as composition-resolved aerosol size distributions. While particle-resolved methods have substantial benefits for highly detailed simulations, these techniques suffer from high computational cost, motivating efforts to improve their algorithmic efficiency. Here we formulate an algorithm for accelerating particle removal processes by aggregating particles of similar size into bins. We present the Binned Algorithm for particle removal processes and analyze its performance with application to the atmospherically relevant process of aerosol dry deposition. We show that the Binned Algorithm can dramatically improve the efficiency of particle removals, particularly for low removalmore » rates, and that computational cost is reduced without introducing additional error. In simulations of aerosol particle removal by dry deposition in atmospherically relevant conditions, we demonstrate about 50-times increase in algorithm efficiency.« less

  5. Distributed collaborative response surface method for mechanical dynamic assembly reliability design

    NASA Astrophysics Data System (ADS)

    Bai, Guangchen; Fei, Chengwei

    2013-11-01

    Because of the randomness of many impact factors influencing the dynamic assembly relationship of complex machinery, the reliability analysis of dynamic assembly relationship needs to be accomplished considering the randomness from a probabilistic perspective. To improve the accuracy and efficiency of dynamic assembly relationship reliability analysis, the mechanical dynamic assembly reliability(MDAR) theory and a distributed collaborative response surface method(DCRSM) are proposed. The mathematic model of DCRSM is established based on the quadratic response surface function, and verified by the assembly relationship reliability analysis of aeroengine high pressure turbine(HPT) blade-tip radial running clearance(BTRRC). Through the comparison of the DCRSM, traditional response surface method(RSM) and Monte Carlo Method(MCM), the results show that the DCRSM is not able to accomplish the computational task which is impossible for the other methods when the number of simulation is more than 100 000 times, but also the computational precision for the DCRSM is basically consistent with the MCM and improved by 0.40˜4.63% to the RSM, furthermore, the computational efficiency of DCRSM is up to about 188 times of the MCM and 55 times of the RSM under 10000 times simulations. The DCRSM is demonstrated to be a feasible and effective approach for markedly improving the computational efficiency and accuracy of MDAR analysis. Thus, the proposed research provides the promising theory and method for the MDAR design and optimization, and opens a novel research direction of probabilistic analysis for developing the high-performance and high-reliability of aeroengine.

  6. Improving Design Efficiency for Large-Scale Heterogeneous Circuits

    NASA Astrophysics Data System (ADS)

    Gregerson, Anthony

    Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.

  7. Connecting Performance Analysis and Visualization to Advance Extreme Scale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bremer, Peer-Timo; Mohr, Bernd; Schulz, Martin

    2015-07-29

    The characterization, modeling, analysis, and tuning of software performance has been a central topic in High Performance Computing (HPC) since its early beginnings. The overall goal is to make HPC software run faster on particular hardware, either through better scheduling, on-node resource utilization, or more efficient distributed communication.

  8. An Efficient Local Correlation Matrix Decomposition Approach for the Localization Implementation of Ensemble-Based Assimilation Methods

    NASA Astrophysics Data System (ADS)

    Zhang, Hongqin; Tian, Xiangjun

    2018-04-01

    Ensemble-based data assimilation methods often use the so-called localization scheme to improve the representation of the ensemble background error covariance (Be). Extensive research has been undertaken to reduce the computational cost of these methods by using the localized ensemble samples to localize Be by means of a direct decomposition of the local correlation matrix C. However, the computational costs of the direct decomposition of the local correlation matrix C are still extremely high due to its high dimension. In this paper, we propose an efficient local correlation matrix decomposition approach based on the concept of alternating directions. This approach is intended to avoid direct decomposition of the correlation matrix. Instead, we first decompose the correlation matrix into 1-D correlation matrices in the three coordinate directions, then construct their empirical orthogonal function decomposition at low resolution. This procedure is followed by the 1-D spline interpolation process to transform the above decompositions to the high-resolution grid. Finally, an efficient correlation matrix decomposition is achieved by computing the very similar Kronecker product. We conducted a series of comparison experiments to illustrate the validity and accuracy of the proposed local correlation matrix decomposition approach. The effectiveness of the proposed correlation matrix decomposition approach and its efficient localization implementation of the nonlinear least-squares four-dimensional variational assimilation are further demonstrated by several groups of numerical experiments based on the Advanced Research Weather Research and Forecasting model.

  9. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens.

    PubMed

    Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin

    2017-06-01

    We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Combining on-chip synthesis of a focused combinatorial library with computational target prediction reveals imidazopyridine GPCR ligands.

    PubMed

    Reutlinger, Michael; Rodrigues, Tiago; Schneider, Petra; Schneider, Gisbert

    2014-01-07

    Using the example of the Ugi three-component reaction we report a fast and efficient microfluidic-assisted entry into the imidazopyridine scaffold, where building block prioritization was coupled to a new computational method for predicting ligand-target associations. We identified an innovative GPCR-modulating combinatorial chemotype featuring ligand-efficient adenosine A1/2B and adrenergic α1A/B receptor antagonists. Our results suggest the tight integration of microfluidics-assisted synthesis with computer-based target prediction as a viable approach to rapidly generate bioactivity-focused combinatorial compound libraries with high success rates. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frank Mueller

    2009-02-05

    MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less

  12. The use of methods of structural optimization at the stage of designing high-rise buildings with steel construction

    NASA Astrophysics Data System (ADS)

    Vasilkin, Andrey

    2018-03-01

    The more designing solutions at the search stage for design for high-rise buildings can be synthesized by the engineer, the more likely that the final adopted version will be the most efficient and economical. However, in modern market conditions, taking into account the complexity and responsibility of high-rise buildings the designer does not have the necessary time to develop, analyze and compare any significant number of options. To solve this problem, it is expedient to use the high potential of computer-aided designing. To implement automated search for design solutions, it is proposed to develop the computing facilities, the application of which will significantly increase the productivity of the designer and reduce the complexity of designing. Methods of structural and parametric optimization have been adopted as the basis of the computing facilities. Their efficiency in the synthesis of design solutions is shown, also the schemes, that illustrate and explain the introduction of structural optimization in the traditional design of steel frames, are constructed. To solve the problem of synthesis and comparison of design solutions for steel frames, it is proposed to develop the computing facilities that significantly reduces the complexity of search designing and based on the use of methods of structural and parametric optimization.

  13. SCOTCH: Secure Counting Of encrypTed genomiC data using a Hybrid approach.

    PubMed

    Chenghong, Wang; Jiang, Yichen; Mohammed, Noman; Chen, Feng; Jiang, Xiaoqian; Al Aziz, Md Momin; Sadat, Md Nazmus; Wang, Shuang

    2017-01-01

    As genomic data are usually at large scale and highly sensitive, it is essential to enable both efficient and secure analysis, by which the data owner can securely delegate both computation and storage on untrusted public cloud. Counting query of genotypes is a basic function for many downstream applications in biomedical research (e.g., computing allele frequency, calculating chi-squared statistics, etc.). Previous solutions show promise on secure counting of outsourced data but the efficiency is still a big limitation for real world applications. In this paper, we propose a novel hybrid solution to combine a rigorous theoretical model (homomorphic encryption) and the latest hardware-based infrastructure (i.e., Software Guard Extensions) to speed up the computation while preserving the privacy of both data owners and data users. Our results demonstrated efficiency by using the real data from the personal genome project.

  14. A Power Efficient Exaflop Computer Design for Global Cloud System Resolving Climate Models.

    NASA Astrophysics Data System (ADS)

    Wehner, M. F.; Oliker, L.; Shalf, J.

    2008-12-01

    Exascale computers would allow routine ensemble modeling of the global climate system at the cloud system resolving scale. Power and cost requirements of traditional architecture systems are likely to delay such capability for many years. We present an alternative route to the exascale using embedded processor technology to design a system optimized for ultra high resolution climate modeling. These power efficient processors, used in consumer electronic devices such as mobile phones, portable music players, cameras, etc., can be tailored to the specific needs of scientific computing. We project that a system capable of integrating a kilometer scale climate model a thousand times faster than real time could be designed and built in a five year time scale for US$75M with a power consumption of 3MW. This is cheaper, more power efficient and sooner than any other existing technology.

  15. SCOTCH: Secure Counting Of encrypTed genomiC data using a Hybrid approach

    PubMed Central

    Chenghong, Wang; Jiang, Yichen; Mohammed, Noman; Chen, Feng; Jiang, Xiaoqian; Al Aziz, Md Momin; Sadat, Md Nazmus; Wang, Shuang

    2017-01-01

    As genomic data are usually at large scale and highly sensitive, it is essential to enable both efficient and secure analysis, by which the data owner can securely delegate both computation and storage on untrusted public cloud. Counting query of genotypes is a basic function for many downstream applications in biomedical research (e.g., computing allele frequency, calculating chi-squared statistics, etc.). Previous solutions show promise on secure counting of outsourced data but the efficiency is still a big limitation for real world applications. In this paper, we propose a novel hybrid solution to combine a rigorous theoretical model (homomorphic encryption) and the latest hardware-based infrastructure (i.e., Software Guard Extensions) to speed up the computation while preserving the privacy of both data owners and data users. Our results demonstrated efficiency by using the real data from the personal genome project. PMID:29854245

  16. The TeraShake Computational Platform for Large-Scale Earthquake Simulations

    NASA Astrophysics Data System (ADS)

    Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

    Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.

  17. RAMA: A file system for massively parallel computers

    NASA Technical Reports Server (NTRS)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  18. Use of nonlinear design optimization techniques in the comparison of battery discharger topologies for the space platform

    NASA Technical Reports Server (NTRS)

    Sable, Dan M.; Cho, Bo H.; Lee, Fred C.

    1990-01-01

    A detailed comparison of a boost converter, a voltage-fed, autotransformer converter, and a multimodule boost converter, designed specifically for the space platform battery discharger, is performed. Computer-based nonlinear optimization techniques are used to facilitate an objective comparison. The multimodule boost converter is shown to be the optimum topology at all efficiencies. The margin is greatest at 97 percent efficiency. The multimodule, multiphase boost converter combines the advantages of high efficiency, light weight, and ample margin on the component stresses, thus ensuring high reliability.

  19. A three-dimensional finite-element thermal/mechanical analytical technique for high-performance traveling wave tubes

    NASA Technical Reports Server (NTRS)

    Bartos, Karen F.; Fite, E. Brian; Shalkhauser, Kurt A.; Sharp, G. Richard

    1991-01-01

    Current research in high-efficiency, high-performance traveling wave tubes (TWT's) has led to the development of novel thermal/ mechanical computer models for use with helical slow-wave structures. A three-dimensional, finite element computer model and analytical technique used to study the structural integrity and thermal operation of a high-efficiency, diamond-rod, K-band TWT designed for use in advanced space communications systems. This analysis focused on the slow-wave circuit in the radiofrequency section of the TWT, where an inherent localized heating problem existed and where failures were observed during earlier cold compression, or 'coining' fabrication technique that shows great potential for future TWT development efforts. For this analysis, a three-dimensional, finite element model was used along with MARC, a commercially available finite element code, to simulate the fabrication of a diamond-rod TWT. This analysis was conducted by using component and material specifications consistent with actual TWT fabrication and was verified against empirical data. The analysis is nonlinear owing to material plasticity introduced by the forming process and also to geometric nonlinearities presented by the component assembly configuration. The computer model was developed by using the high efficiency, K-band TWT design but is general enough to permit similar analyses to be performed on a wide variety of TWT designs and styles. The results of the TWT operating condition and structural failure mode analysis, as well as a comparison of analytical results to test data are presented.

  20. A three-dimensional finite-element thermal/mechanical analytical technique for high-performance traveling wave tubes

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Kurt A.; Bartos, Karen F.; Fite, E. B.; Sharp, G. R.

    1992-01-01

    Current research in high-efficiency, high-performance traveling wave tubes (TWT's) has led to the development of novel thermal/mechanical computer models for use with helical slow-wave structures. A three-dimensional, finite element computer model and analytical technique used to study the structural integrity and thermal operation of a high-efficiency, diamond-rod, K-band TWT designed for use in advanced space communications systems. This analysis focused on the slow-wave circuit in the radiofrequency section of the TWT, where an inherent localized heating problem existed and where failures were observed during earlier cold compression, or 'coining' fabrication technique that shows great potential for future TWT development efforts. For this analysis, a three-dimensional, finite element model was used along with MARC, a commercially available finite element code, to simulate the fabrication of a diamond-rod TWT. This analysis was conducted by using component and material specifications consistent with actual TWT fabrication and was verified against empirical data. The analysis is nonlinear owing to material plasticity introduced by the forming process and also to geometric nonlinearities presented by the component assembly configuration. The computer model was developed by using the high efficiency, K-band TWT design but is general enough to permit similar analyses to be performed on a wide variety of TWT designs and styles. The results of the TWT operating condition and structural failure mode analysis, as well as a comparison of analytical results to test data are presented.

  1. A computationally efficient ductile damage model accounting for nucleation and micro-inertia at high triaxialities

    DOE PAGES

    Versino, Daniele; Bronkhorst, Curt Allan

    2018-01-31

    The computational formulation of a micro-mechanical material model for the dynamic failure of ductile metals is presented in this paper. The statistical nature of porosity initiation is accounted for by introducing an arbitrary probability density function which describes the pores nucleation pressures. Each micropore within the representative volume element is modeled as a thick spherical shell made of plastically incompressible material. The treatment of porosity by a distribution of thick-walled spheres also allows for the inclusion of micro-inertia effects under conditions of shock and dynamic loading. The second order ordinary differential equation governing the microscopic porosity evolution is solved withmore » a robust implicit procedure. A new Chebyshev collocation method is employed to approximate the porosity distribution and remapping is used to optimize memory usage. The adaptive approximation of the porosity distribution leads to a reduction of computational time and memory usage of up to two orders of magnitude. Moreover, the proposed model affords consistent performance: changing the nucleation pressure probability density function and/or the applied strain rate does not reduce accuracy or computational efficiency of the material model. The numerical performance of the model and algorithms presented is tested against three problems for high density tantalum: single void, one-dimensional uniaxial strain, and two-dimensional plate impact. Here, the results using the integration and algorithmic advances suggest a significant improvement in computational efficiency and accuracy over previous treatments for dynamic loading conditions.« less

  2. A low-power and high-quality implementation of the discrete cosine transformation

    NASA Astrophysics Data System (ADS)

    Heyne, B.; Götze, J.

    2007-06-01

    In this paper a computationally efficient and high-quality preserving DCT architecture is presented. It is obtained by optimizing the Loeffler DCT based on the Cordic algorithm. The computational complexity is reduced from 11 multiply and 29 add operations (Loeffler DCT) to 38 add and 16 shift operations (which is similar to the complexity of the binDCT). The experimental results show that the proposed DCT algorithm not only reduces the computational complexity significantly, but also retains the good transformation quality of the Loeffler DCT. Therefore, the proposed Cordic based Loeffler DCT is especially suited for low-power and high-quality CODECs in battery-based systems.

  3. High Performance Processors for Space Environments: A Subproject of the NASA Exploration Missions Systems Directorate "Radiation Hardened Electronics for Space Environments" Technology Development Program

    NASA Technical Reports Server (NTRS)

    Johnson, M.; Label, K.; McCabe, J.; Powell, W.; Bolotin, G.; Kolawa, E.; Ng, T.; Hyde, D.

    2007-01-01

    Implementation of challenging Exploration Systems Missions Directorate objectives and strategies can be constrained by onboard computing capabilities and power efficiencies. The Radiation Hardened Electronics for Space Environments (RHESE) High Performance Processors for Space Environments project will address this challenge by significantly advancing the sustained throughput and processing efficiency of high-per$ormance radiation-hardened processors, targeting delivery of products by the end of FY12.

  4. Low-cost, high-performance and efficiency computational photometer design

    NASA Astrophysics Data System (ADS)

    Siewert, Sam B.; Shihadeh, Jeries; Myers, Randall; Khandhar, Jay; Ivanov, Vitaly

    2014-05-01

    Researchers at the University of Alaska Anchorage and University of Colorado Boulder have built a low cost high performance and efficiency drop-in-place Computational Photometer (CP) to test in field applications ranging from port security and safety monitoring to environmental compliance monitoring and surveying. The CP integrates off-the-shelf visible spectrum cameras with near to long wavelength infrared detectors and high resolution digital snapshots in a single device. The proof of concept combines three or more detectors into a single multichannel imaging system that can time correlate read-out, capture, and image process all of the channels concurrently with high performance and energy efficiency. The dual-channel continuous read-out is combined with a third high definition digital snapshot capability and has been designed using an FPGA (Field Programmable Gate Array) to capture, decimate, down-convert, re-encode, and transform images from two standard definition CCD (Charge Coupled Device) cameras at 30Hz. The continuous stereo vision can be time correlated to megapixel high definition snapshots. This proof of concept has been fabricated as a fourlayer PCB (Printed Circuit Board) suitable for use in education and research for low cost high efficiency field monitoring applications that need multispectral and three dimensional imaging capabilities. Initial testing is in progress and includes field testing in ports, potential test flights in un-manned aerial systems, and future planned missions to image harsh environments in the arctic including volcanic plumes, ice formation, and arctic marine life.

  5. Development of a Web Based Simulating System for Earthquake Modeling on the Grid

    NASA Astrophysics Data System (ADS)

    Seber, D.; Youn, C.; Kaiser, T.

    2007-12-01

    Existing cyberinfrastructure-based information, data and computational networks now allow development of state- of-the-art, user-friendly simulation environments that democratize access to high-end computational environments and provide new research opportunities for many research and educational communities. Within the Geosciences cyberinfrastructure network, GEON, we have developed the SYNSEIS (SYNthetic SEISmogram) toolkit to enable efficient computations of 2D and 3D seismic waveforms for a variety of research purposes especially for helping to analyze the EarthScope's USArray seismic data in a speedy and efficient environment. The underlying simulation software in SYNSEIS is a finite difference code, E3D, developed by LLNL (S. Larsen). The code is embedded within the SYNSEIS portlet environment and it is used by our toolkit to simulate seismic waveforms of earthquakes at regional distances (<1000km). Architecturally, SYNSEIS uses both Web Service and Grid computing resources in a portal-based work environment and has a built in access mechanism to connect to national supercomputer centers as well as to a dedicated, small-scale compute cluster for its runs. Even though Grid computing is well-established in many computing communities, its use among domain scientists still is not trivial because of multiple levels of complexities encountered. We grid-enabled E3D using our own dialect XML inputs that include geological models that are accessible through standard Web services within the GEON network. The XML inputs for this application contain structural geometries, source parameters, seismic velocity, density, attenuation values, number of time steps to compute, and number of stations. By enabling a portal based access to a such computational environment coupled with its dynamic user interface we enable a large user community to take advantage of such high end calculations in their research and educational activities. Our system can be used to promote an efficient and effective modeling environment to help scientists as well as educators in their daily activities and speed up the scientific discovery process.

  6. A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.

    PubMed

    De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc

    2010-09-01

    In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.

  7. Heralding efficiency and correlated-mode coupling of near-IR fiber-coupled photon pairs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dixon, P. Ben; Rosenberg, Danna; Stelmakh, Veronika

    We report on a systematic experimental study of heralding efficiency and generation rate of telecom-band infrared photon pairs generated by spontaneous parametric down-conversion and coupled to single mode optical fibers. We define the correlated-mode coupling efficiency--an inherent source efficiency--and explain its relation to heralding efficiency. For our experiment, we developed a reconfigurable computer controlled pump-beam and collection-mode optical apparatus which we used to measure the generation rate and correlated-mode coupling efficiency. The use of low-noise, high-efficiency superconducting-nanowire single-photon-detectors in this setup allowed us to explore focus configurations with low overall photon flux. The measured data agree well with theory andmore » we demonstrated a correlated-mode coupling efficiency of 97%±2%, which is the highest efficiency yet achieved for this type of system. These results confirm theoretical treatments and demonstrate that very high overall heralding efficiencies can, in principle, be achieved in quantum optical systems. We expect that these results and techniques will be widely incorporated into future systems that require, or benefit from, a high heralding efficiency.« less

  8. Heralding efficiency and correlated-mode coupling of near-IR fiber-coupled photon pairs

    DOE PAGES

    Dixon, P. Ben; Rosenberg, Danna; Stelmakh, Veronika; ...

    2014-10-06

    We report on a systematic experimental study of heralding efficiency and generation rate of telecom-band infrared photon pairs generated by spontaneous parametric down-conversion and coupled to single mode optical fibers. We define the correlated-mode coupling efficiency--an inherent source efficiency--and explain its relation to heralding efficiency. For our experiment, we developed a reconfigurable computer controlled pump-beam and collection-mode optical apparatus which we used to measure the generation rate and correlated-mode coupling efficiency. The use of low-noise, high-efficiency superconducting-nanowire single-photon-detectors in this setup allowed us to explore focus configurations with low overall photon flux. The measured data agree well with theory andmore » we demonstrated a correlated-mode coupling efficiency of 97%±2%, which is the highest efficiency yet achieved for this type of system. These results confirm theoretical treatments and demonstrate that very high overall heralding efficiencies can, in principle, be achieved in quantum optical systems. We expect that these results and techniques will be widely incorporated into future systems that require, or benefit from, a high heralding efficiency.« less

  9. Thermodynamics and Transport Phenomena in High Temperature Steam Electrolysis Cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    James E. O'Brien

    2012-03-01

    Hydrogen can be produced from water splitting with relatively high efficiency using high temperature electrolysis. This technology makes use of solid-oxide cells, running in the electrolysis mode to produce hydrogen from steam, while consuming electricity and high temperature process heat. The overall thermal-to-hydrogen efficiency for high temperature electrolysis can be as high as 50%, which is about double the overall efficiency of conventional low-temperature electrolysis. Current large-scale hydrogen production is based almost exclusively on steam reforming of methane, a method that consumes a precious fossil fuel while emitting carbon dioxide to the atmosphere. An overview of high temperature electrolysis technologymore » will be presented, including basic thermodynamics, experimental methods, heat and mass transfer phenomena, and computational fluid dynamics modeling.« less

  10. Parallel aeroelastic computations for wing and wing-body configurations

    NASA Technical Reports Server (NTRS)

    Byun, Chansup

    1994-01-01

    The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.

  11. Investigation to realize a computationally efficient implementation of the high-order instantaneous-moments-based fringe analysis method

    NASA Astrophysics Data System (ADS)

    Gorthi, Sai Siva; Rajshekhar, Gannavarpu; Rastogi, Pramod

    2010-06-01

    Recently, a high-order instantaneous moments (HIM)-operator-based method was proposed for accurate phase estimation in digital holographic interferometry. The method relies on piece-wise polynomial approximation of phase and subsequent evaluation of the polynomial coefficients from the HIM operator using single-tone frequency estimation. The work presents a comparative analysis of the performance of different single-tone frequency estimation techniques, like Fourier transform followed by optimization, estimation of signal parameters by rotational invariance technique (ESPRIT), multiple signal classification (MUSIC), and iterative frequency estimation by interpolation on Fourier coefficients (IFEIF) in HIM-operator-based methods for phase estimation. Simulation and experimental results demonstrate the potential of the IFEIF technique with respect to computational efficiency and estimation accuracy.

  12. Load Balancing Strategies for Multi-Block Overset Grid Applications

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.

  13. Carbon nanotube computer.

    PubMed

    Shulaker, Max M; Hills, Gage; Patil, Nishant; Wei, Hai; Chen, Hong-Yu; Wong, H-S Philip; Mitra, Subhasish

    2013-09-26

    The miniaturization of electronic devices has been the principal driving force behind the semiconductor industry, and has brought about major improvements in computational power and energy efficiency. Although advances with silicon-based electronics continue to be made, alternative technologies are being explored. Digital circuits based on transistors fabricated from carbon nanotubes (CNTs) have the potential to outperform silicon by improving the energy-delay product, a metric of energy efficiency, by more than an order of magnitude. Hence, CNTs are an exciting complement to existing semiconductor technologies. Owing to substantial fundamental imperfections inherent in CNTs, however, only very basic circuit blocks have been demonstrated. Here we show how these imperfections can be overcome, and demonstrate the first computer built entirely using CNT-based transistors. The CNT computer runs an operating system that is capable of multitasking: as a demonstration, we perform counting and integer-sorting simultaneously. In addition, we implement 20 different instructions from the commercial MIPS instruction set to demonstrate the generality of our CNT computer. This experimental demonstration is the most complex carbon-based electronic system yet realized. It is a considerable advance because CNTs are prominent among a variety of emerging technologies that are being considered for the next generation of highly energy-efficient electronic systems.

  14. An energy-efficient failure detector for vehicular cloud computing.

    PubMed

    Liu, Jiaxi; Wu, Zhibo; Dong, Jian; Wu, Jin; Wen, Dongxin

    2018-01-01

    Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption.

  15. An energy-efficient failure detector for vehicular cloud computing

    PubMed Central

    Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Wen, Dongxin

    2018-01-01

    Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption. PMID:29352282

  16. An accurate, compact and computationally efficient representation of orbitals for quantum Monte Carlo calculations

    NASA Astrophysics Data System (ADS)

    Luo, Ye; Esler, Kenneth; Kent, Paul; Shulenburger, Luke

    Quantum Monte Carlo (QMC) calculations of giant molecules, surface and defect properties of solids have been feasible recently due to drastically expanding computational resources. However, with the most computationally efficient basis set, B-splines, these calculations are severely restricted by the memory capacity of compute nodes. The B-spline coefficients are shared on a node but not distributed among nodes, to ensure fast evaluation. A hybrid representation which incorporates atomic orbitals near the ions and B-spline ones in the interstitial regions offers a more accurate and less memory demanding description of the orbitals because they are naturally more atomic like near ions and much smoother in between, thus allowing coarser B-spline grids. We will demonstrate the advantage of hybrid representation over pure B-spline and Gaussian basis sets and also show significant speed-up like computing the non-local pseudopotentials with our new scheme. Moreover, we discuss a new algorithm for atomic orbital initialization which used to require an extra workflow step taking a few days. With this work, the highly efficient hybrid representation paves the way to simulate large size even in-homogeneous systems using QMC. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Computational Materials Sciences Program.

  17. Computationally efficient multibody simulations

    NASA Technical Reports Server (NTRS)

    Ramakrishnan, Jayant; Kumar, Manoj

    1994-01-01

    Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.

  18. Development of a Very Dense Liquid Cooled Compute Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hughes, Phillip N.; Lipp, Robert J.

    2013-12-10

    The objective of this project was to design and develop a prototype very energy efficient high density compute platform with 100% pumped refrigerant liquid cooling using commodity components and high volume manufacturing techniques. Testing at SLAC has indicated that we achieved a DCIE of 0.93 against our original goal of 0.85. This number includes both cooling and power supply and was achieved employing some of the highest wattage processors available.

  19. Efficient modeling of interconnects and capacitive discontinuities in high-speed digital circuits. Thesis

    NASA Technical Reports Server (NTRS)

    Oh, K. S.; Schutt-Aine, J.

    1995-01-01

    Modeling of interconnects and associated discontinuities with the recent advances high-speed digital circuits has gained a considerable interest over the last decade although the theoretical bases for analyzing these structures were well-established as early as the 1960s. Ongoing research at the present time is focused on devising methods which can be applied to more general geometries than the ones considered in earlier days and, at the same time, improving the computational efficiency and accuracy of these methods. In this thesis, numerically efficient methods to compute the transmission line parameters of a multiconductor system and the equivalent capacitances of various strip discontinuities are presented based on the quasi-static approximation. The presented techniques are applicable to conductors embedded in an arbitrary number of dielectric layers with two possible locations of ground planes at the top and bottom of the dielectric layers. The cross-sections of conductors can be arbitrary as long as they can be described with polygons. An integral equation approach in conjunction with the collocation method is used in the presented methods. A closed-form Green's function is derived based on weighted real images thus avoiding nested infinite summations in the exact Green's function; therefore, this closed-form Green's function is numerically more efficient than the exact Green's function. All elements associated with the moment matrix are computed using the closed-form formulas. Various numerical examples are considered to verify the presented methods, and a comparison of the computed results with other published results showed good agreement.

  20. gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing.

    PubMed

    Olejnik, Michael; Steuwer, Michel; Gorlatch, Sergei; Heider, Dominik

    2014-11-15

    Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. The source code can be downloaded at http://www.heiderlab.de d.heider@wz-straubing.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

    DOE PAGES

    Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

    2017-08-29

    Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less

  2. Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

    Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less

  3. Experimental and Computational Sonic Boom Assessment of Lockheed-Martin N+2 Low Boom Models

    NASA Technical Reports Server (NTRS)

    Cliff, Susan E.; Durston, Donald A.; Elmiligui, Alaa A.; Walker, Eric L.; Carter, Melissa B.

    2015-01-01

    Flight at speeds greater than the speed of sound is not permitted over land, primarily because of the noise and structural damage caused by sonic boom pressure waves of supersonic aircraft. Mitigation of sonic boom is a key focus area of the High Speed Project under NASA's Fundamental Aeronautics Program. The project is focusing on technologies to enable future civilian aircraft to fly efficiently with reduced sonic boom, engine and aircraft noise, and emissions. A major objective of the project is to improve both computational and experimental capabilities for design of low-boom, high-efficiency aircraft. NASA and industry partners are developing improved wind tunnel testing techniques and new pressure instrumentation to measure the weak sonic boom pressure signatures of modern vehicle concepts. In parallel, computational methods are being developed to provide rapid design and analysis of supersonic aircraft with improved meshing techniques that provide efficient, robust, and accurate on- and off-body pressures at several body lengths from vehicles with very low sonic boom overpressures. The maturity of these critical parallel efforts is necessary before low-boom flight can be demonstrated and commercial supersonic flight can be realized.

  4. Using quantum chemistry muscle to flex massive systems: How to respond to something perturbing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertoni, Colleen

    Computational chemistry uses the theoretical advances of quantum mechanics and the algorithmic and hardware advances of computer science to give insight into chemical problems. It is currently possible to do highly accurate quantum chemistry calculations, but the most accurate methods are very computationally expensive. Thus it is only feasible to do highly accurate calculations on small molecules, since typically more computationally efficient methods are also less accurate. The overall goal of my dissertation work has been to try to decrease the computational expense of calculations without decreasing the accuracy. In particular, my dissertation work focuses on fragmentation methods, intermolecular interactionsmore » methods, analytic gradients, and taking advantage of new hardware.« less

  5. Metasurface holograms reaching 80% efficiency.

    PubMed

    Zheng, Guoxing; Mühlenbernd, Holger; Kenney, Mitchell; Li, Guixin; Zentgraf, Thomas; Zhang, Shuang

    2015-04-01

    Surfaces covered by ultrathin plasmonic structures--so-called metasurfaces--have recently been shown to be capable of completely controlling the phase of light, representing a new paradigm for the design of innovative optical elements such as ultrathin flat lenses, directional couplers for surface plasmon polaritons and wave plate vortex beam generation. Among the various types of metasurfaces, geometric metasurfaces, which consist of an array of plasmonic nanorods with spatially varying orientations, have shown superior phase control due to the geometric nature of their phase profile. Metasurfaces have recently been used to make computer-generated holograms, but the hologram efficiency remained too low at visible wavelengths for practical purposes. Here, we report the design and realization of a geometric metasurface hologram reaching diffraction efficiencies of 80% at 825 nm and a broad bandwidth between 630 nm and 1,050 nm. The 16-level-phase computer-generated hologram demonstrated here combines the advantages of a geometric metasurface for the superior control of the phase profile and of reflectarrays for achieving high polarization conversion efficiency. Specifically, the design of the hologram integrates a ground metal plane with a geometric metasurface that enhances the conversion efficiency between the two circular polarization states, leading to high diffraction efficiency without complicating the fabrication process. Because of these advantages, our strategy could be viable for various practical holographic applications.

  6. Power and Efficiency Optimized in Traveling-Wave Tubes Over a Broad Frequency Bandwidth

    NASA Technical Reports Server (NTRS)

    Wilson, Jeffrey D.

    2001-01-01

    A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT's are critical components in deep space probes, communication satellites, and high-power radar systems. Power conversion efficiency is of paramount importance for TWT's employed in deep space probes and communication satellites. A previous effort was very successful in increasing efficiency and power at a single frequency (ref. 1). Such an algorithm is sufficient for narrow bandwidth designs, but for optimal designs in applications that require high radiofrequency power over a wide bandwidth, such as high-density communications or high-resolution radar, the variation of the circuit response with respect to frequency must be considered. This work at the NASA Glenn Research Center is the first to develop techniques for optimizing TWT efficiency and output power over a broad frequency bandwidth (ref. 2). The techniques are based on simulated annealing, which has the advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 3). Two new broadband simulated annealing algorithms were developed that optimize (1) minimum saturated power efficiency over a frequency bandwidth and (2) simultaneous bandwidth and minimum power efficiency over the frequency band with constant input power. The algorithms were incorporated into the NASA coupled-cavity TWT computer model (ref. 4) and used to design optimal phase velocity tapers using the 59- to 64-GHz Hughes 961HA coupled-cavity TWT as a baseline model. In comparison to the baseline design, the computational results of the first broad-band design algorithm show an improvement of 73.9 percent in minimum saturated efficiency (see the top graph). The second broadband design algorithm (see the bottom graph) improves minimum radiofrequency efficiency with constant input power drive by a factor of 2.7 at the high band edge (64 GHz) and increases simultaneous bandwidth by 500 MHz.

  7. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    USDA-ARS?s Scientific Manuscript database

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  8. Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems

    PubMed Central

    Abu-Nimeh, Faisal T.; Choong, Woon-Seng

    2017-01-01

    Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card’s ring buffers, bypassing the operating system kernel’s networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity. PMID:28630948

  9. Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems.

    PubMed

    Abu-Nimeh, Faisal T; Choong, Woon-Seng

    2017-03-01

    Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card's ring buffers, bypassing the operating system kernel's networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity.

  10. Nonlinear Detection, Estimation, and Control for Free-Space Optical Communication

    DTIC Science & Technology

    2008-08-17

    original message. The promising features of this communication scheme such as high-bandwidth, power efficiency, and security, render it a viable means...bandwidth, power efficiency, and security, render it a viable means for high data rate point-to-point communication. In this dissertation, we adopt a...Department of Electrical and Computer Engineering In free-space optical communication, the intensity of a laser beam is modulated by a message, the beam

  11. A high performance scientific cloud computing environment for materials simulations

    NASA Astrophysics Data System (ADS)

    Jorissen, K.; Vila, F. D.; Rehr, J. J.

    2012-09-01

    We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.

  12. Large-Scale Computation of Nuclear Magnetic Resonance Shifts for Paramagnetic Solids Using CP2K.

    PubMed

    Mondal, Arobendo; Gaultois, Michael W; Pell, Andrew J; Iannuzzi, Marcella; Grey, Clare P; Hutter, Jürg; Kaupp, Martin

    2018-01-09

    Large-scale computations of nuclear magnetic resonance (NMR) shifts for extended paramagnetic solids (pNMR) are reported using the highly efficient Gaussian-augmented plane-wave implementation of the CP2K code. Combining hyperfine couplings obtained with hybrid functionals with g-tensors and orbital shieldings computed using gradient-corrected functionals, contact, pseudocontact, and orbital-shift contributions to pNMR shifts are accessible. Due to the efficient and highly parallel performance of CP2K, a wide variety of materials with large unit cells can be studied with extended Gaussian basis sets. Validation of various approaches for the different contributions to pNMR shifts is done first for molecules in a large supercell in comparison with typical quantum-chemical codes. This is then extended to a detailed study of g-tensors for extended solid transition-metal fluorides and for a series of complex lithium vanadium phosphates. Finally, lithium pNMR shifts are computed for Li 3 V 2 (PO 4 ) 3 , for which detailed experimental data are available. This has allowed an in-depth study of different approaches (e.g., full periodic versus incremental cluster computations of g-tensors and different functionals and basis sets for hyperfine computations) as well as a thorough analysis of the different contributions to the pNMR shifts. This study paves the way for a more-widespread computational treatment of NMR shifts for paramagnetic materials.

  13. Local Orthogonal Cutting Method for Computing Medial Curves and Its Biomedical Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiao, Xiangmin; Einstein, Daniel R.; Dyedov, Volodymyr

    2010-03-24

    Medial curves have a wide range of applications in geometric modeling and analysis (such as shape matching) and biomedical engineering (such as morphometry and computer assisted surgery). The computation of medial curves poses significant challenges, both in terms of theoretical analysis and practical efficiency and reliability. In this paper, we propose a definition and analysis of medial curves and also describe an efficient and robust method for computing medial curves. Our approach is based on three key concepts: a local orthogonal decomposition of objects into substructures, a differential geometry concept called the interior center of curvature (ICC), and integrated stabilitymore » and consistency tests. These concepts lend themselves to robust numerical techniques including eigenvalue analysis, weighted least squares approximations, and numerical minimization, resulting in an algorithm that is efficient and noise resistant. We illustrate the effectiveness and robustness of our approach with some highly complex, large-scale, noisy biomedical geometries derived from medical images, including lung airways and blood vessels. We also present comparisons of our method with some existing methods.« less

  14. A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nomura, K; Seymour, R; Wang, W

    2009-02-17

    A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less

  15. Development of the Tensoral Computer Language

    NASA Technical Reports Server (NTRS)

    Ferziger, Joel; Dresselhaus, Eliot

    1996-01-01

    The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.

  16. Model Reduction of Computational Aerothermodynamics for Multi-Discipline Analysis in High Speed Flows

    NASA Astrophysics Data System (ADS)

    Crowell, Andrew Rippetoe

    This dissertation describes model reduction techniques for the computation of aerodynamic heat flux and pressure loads for multi-disciplinary analysis of hypersonic vehicles. NASA and the Department of Defense have expressed renewed interest in the development of responsive, reusable hypersonic cruise vehicles capable of sustained high-speed flight and access to space. However, an extensive set of technical challenges have obstructed the development of such vehicles. These technical challenges are partially due to both the inability to accurately test scaled vehicles in wind tunnels and to the time intensive nature of high-fidelity computational modeling, particularly for the fluid using Computational Fluid Dynamics (CFD). The aim of this dissertation is to develop efficient and accurate models for the aerodynamic heat flux and pressure loads to replace the need for computationally expensive, high-fidelity CFD during coupled analysis. Furthermore, aerodynamic heating and pressure loads are systematically evaluated for a number of different operating conditions, including: simple two-dimensional flow over flat surfaces up to three-dimensional flows over deformed surfaces with shock-shock interaction and shock-boundary layer interaction. An additional focus of this dissertation is on the implementation and computation of results using the developed aerodynamic heating and pressure models in complex fluid-thermal-structural simulations. Model reduction is achieved using a two-pronged approach. One prong focuses on developing analytical corrections to isothermal, steady-state CFD flow solutions in order to capture flow effects associated with transient spatially-varying surface temperatures and surface pressures (e.g., surface deformation, surface vibration, shock impingements, etc.). The second prong is focused on minimizing the computational expense of computing the steady-state CFD solutions by developing an efficient surrogate CFD model. The developed two-pronged approach is found to exhibit balanced performance in terms of accuracy and computational expense, relative to several existing approaches. This approach enables CFD-based loads to be implemented into long duration fluid-thermal-structural simulations.

  17. Blazing Signature Filter: a library for fast pairwise similarity comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Joon-Yong; Fujimoto, Grant M.; Wilson, Ryan

    Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is the vast majoritymore » of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.« less

  18. Fast and Accurate Poisson Denoising With Trainable Nonlinear Diffusion.

    PubMed

    Feng, Wensen; Qiao, Peng; Chen, Yunjin; Wensen Feng; Peng Qiao; Yunjin Chen; Feng, Wensen; Chen, Yunjin; Qiao, Peng

    2018-06-01

    The degradation of the acquired signal by Poisson noise is a common problem for various imaging applications, such as medical imaging, night vision, and microscopy. Up to now, many state-of-the-art Poisson denoising techniques mainly concentrate on achieving utmost performance, with little consideration for the computation efficiency. Therefore, in this paper we aim to propose an efficient Poisson denoising model with both high computational efficiency and recovery quality. To this end, we exploit the newly developed trainable nonlinear reaction diffusion (TNRD) model which has proven an extremely fast image restoration approach with performance surpassing recent state-of-the-arts. However, the straightforward direct gradient descent employed in the original TNRD-based denoising task is not applicable in this paper. To solve this problem, we resort to the proximal gradient descent method. We retrain the model parameters, including the linear filters and influence functions by taking into account the Poisson noise statistics, and end up with a well-trained nonlinear diffusion model specialized for Poisson denoising. The trained model provides strongly competitive results against state-of-the-art approaches, meanwhile bearing the properties of simple structure and high efficiency. Furthermore, our proposed model comes along with an additional advantage, that the diffusion process is well-suited for parallel computation on graphics processing units (GPUs). For images of size , our GPU implementation takes less than 0.1 s to produce state-of-the-art Poisson denoising performance.

  19. Sparse Regression as a Sparse Eigenvalue Problem

    NASA Technical Reports Server (NTRS)

    Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai

    2008-01-01

    We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization

  20. Exploring Cloud Computing for Large-scale Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Guang; Han, Binh; Yin, Jian

    This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less

  1. Parallel, adaptive finite element methods for conservation laws

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.

    1994-01-01

    We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.

  2. Efficient tiled calculation of over-10-gigapixel holograms using ray-wavefront conversion.

    PubMed

    Igarashi, Shunsuke; Nakamura, Tomoya; Matsushima, Kyoji; Yamaguchi, Masahiro

    2018-04-16

    In the calculation of large-scale computer-generated holograms, an approach called "tiling," which divides the hologram plane into small rectangles, is often employed due to limitations on computational memory. However, the total amount of computational complexity severely increases with the number of divisions. In this paper, we propose an efficient method for calculating tiled large-scale holograms using ray-wavefront conversion. In experiments, the effectiveness of the proposed method was verified by comparing its calculation cost with that using the previous method. Additionally, a hologram of 128K × 128K pixels was calculated and fabricated by a laser-lithography system, and a high-quality 105 mm × 105 mm 3D image including complicated reflection and translucency was optically reconstructed.

  3. Open source acceleration of wave optics simulations on energy efficient high-performance computing platforms

    NASA Astrophysics Data System (ADS)

    Beck, Jeffrey; Bos, Jeremy P.

    2017-05-01

    We compare several modifications to the open-source wave optics package, WavePy, intended to improve execution time. Specifically, we compare the relative performance of the Intel MKL, a CPU based OpenCV distribution, and GPU-based version. Performance is compared between distributions both on the same compute platform and between a fully-featured computing workstation and the NVIDIA Jetson TX1 platform. Comparisons are drawn in terms of both execution time and power consumption. We have found that substituting the Fast Fourier Transform operation from OpenCV provides a marked improvement on all platforms. In addition, we show that embedded platforms offer some possibility for extensive improvement in terms of efficiency compared to a fully featured workstation.

  4. Modeling bioluminescent photon transport in tissue based on Radiosity-diffusion model

    NASA Astrophysics Data System (ADS)

    Sun, Li; Wang, Pu; Tian, Jie; Zhang, Bo; Han, Dong; Yang, Xin

    2010-03-01

    Bioluminescence tomography (BLT) is one of the most important non-invasive optical molecular imaging modalities. The model for the bioluminescent photon propagation plays a significant role in the bioluminescence tomography study. Due to the high computational efficiency, diffusion approximation (DA) is generally applied in the bioluminescence tomography. But the diffusion equation is valid only in highly scattering and weakly absorbing regions and fails in non-scattering or low-scattering tissues, such as a cyst in the breast, the cerebrospinal fluid (CSF) layer of the brain and synovial fluid layer in the joints. A hybrid Radiosity-diffusion model is proposed for dealing with the non-scattering regions within diffusing domains in this paper. This hybrid method incorporates a priori information of the geometry of non-scattering regions, which can be acquired by magnetic resonance imaging (MRI) or x-ray computed tomography (CT). Then the model is implemented using a finite element method (FEM) to ensure the high computational efficiency. Finally, we demonstrate that the method is comparable with Mont Carlo (MC) method which is regarded as a 'gold standard' for photon transportation simulation.

  5. Calculations of 3D compressible flows using an efficient low diffusion upwind scheme

    NASA Astrophysics Data System (ADS)

    Hu, Zongjun; Zha, Gecheng

    2005-01-01

    A newly suggested E-CUSP upwind scheme is employed for the first time to calculate 3D flows of propulsion systems. The E-CUSP scheme contains the total energy in the convective vector and is fully consistent with the characteristic directions. The scheme is proved to have low diffusion and high CPU efficiency. The computed cases in this paper include a transonic nozzle with circular-to-rectangular cross-section, a transonic duct with shock wave/turbulent boundary layer interaction, and a subsonic 3D compressor cascade. The computed results agree well with the experiments. The new scheme is proved to be accurate, efficient and robust for the 3D calculations of the flows in this paper.

  6. Making Spatial Statistics Service Accessible On Cloud Platform

    NASA Astrophysics Data System (ADS)

    Mu, X.; Wu, J.; Li, T.; Zhong, Y.; Gao, X.

    2014-04-01

    Web service can bring together applications running on diverse platforms, users can access and share various data, information and models more effectively and conveniently from certain web service platform. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtualized resources are provided as services. With the rampant growth of massive data and restriction of net, traditional web services platforms have some prominent problems existing in development such as calculation efficiency, maintenance cost and data security. In this paper, we offer a spatial statistics service based on Microsoft cloud. An experiment was carried out to evaluate the availability and efficiency of this service. The results show that this spatial statistics service is accessible for the public conveniently with high processing efficiency.

  7. Artificial Neuron Based on Integrated Semiconductor Quantum Dot Mode-Locked Lasers

    NASA Astrophysics Data System (ADS)

    Mesaritakis, Charis; Kapsalis, Alexandros; Bogris, Adonis; Syvridis, Dimitris

    2016-12-01

    Neuro-inspired implementations have attracted strong interest as a power efficient and robust alternative to the digital model of computation with a broad range of applications. Especially, neuro-mimetic systems able to produce and process spike-encoding schemes can offer merits like high noise-resiliency and increased computational efficiency. Towards this direction, integrated photonics can be an auspicious platform due to its multi-GHz bandwidth, its high wall-plug efficiency and the strong similarity of its dynamics under excitation with biological spiking neurons. Here, we propose an integrated all-optical neuron based on an InAs/InGaAs semiconductor quantum-dot passively mode-locked laser. The multi-band emission capabilities of these lasers allows, through waveband switching, the emulation of the excitation and inhibition modes of operation. Frequency-response effects, similar to biological neural circuits, are observed just as in a typical two-section excitable laser. The demonstrated optical building block can pave the way for high-speed photonic integrated systems able to address tasks ranging from pattern recognition to cognitive spectrum management and multi-sensory data processing.

  8. Artificial Neuron Based on Integrated Semiconductor Quantum Dot Mode-Locked Lasers

    PubMed Central

    Mesaritakis, Charis; Kapsalis, Alexandros; Bogris, Adonis; Syvridis, Dimitris

    2016-01-01

    Neuro-inspired implementations have attracted strong interest as a power efficient and robust alternative to the digital model of computation with a broad range of applications. Especially, neuro-mimetic systems able to produce and process spike-encoding schemes can offer merits like high noise-resiliency and increased computational efficiency. Towards this direction, integrated photonics can be an auspicious platform due to its multi-GHz bandwidth, its high wall-plug efficiency and the strong similarity of its dynamics under excitation with biological spiking neurons. Here, we propose an integrated all-optical neuron based on an InAs/InGaAs semiconductor quantum-dot passively mode-locked laser. The multi-band emission capabilities of these lasers allows, through waveband switching, the emulation of the excitation and inhibition modes of operation. Frequency-response effects, similar to biological neural circuits, are observed just as in a typical two-section excitable laser. The demonstrated optical building block can pave the way for high-speed photonic integrated systems able to address tasks ranging from pattern recognition to cognitive spectrum management and multi-sensory data processing. PMID:27991574

  9. Total variation-based neutron computed tomography

    NASA Astrophysics Data System (ADS)

    Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick

    2018-05-01

    We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.

  10. Rational design of an enzyme mutant for anti-cocaine therapeutics

    NASA Astrophysics Data System (ADS)

    Zheng, Fang; Zhan, Chang-Guo

    2008-09-01

    (-)-Cocaine is a widely abused drug and there is no available anti-cocaine therapeutic. The disastrous medical and social consequences of cocaine addiction have made the development of an effective pharmacological treatment a high priority. An ideal anti-cocaine medication would be to accelerate (-)-cocaine metabolism producing biologically inactive metabolites. The main metabolic pathway of cocaine in body is the hydrolysis at its benzoyl ester group. Reviewed in this article is the state-of-the-art computational design of high-activity mutants of human butyrylcholinesterase (BChE) against (-)-cocaine. The computational design of BChE mutants have been based on not only the structure of the enzyme, but also the detailed catalytic mechanisms for BChE-catalyzed hydrolysis of (-)-cocaine and (+)-cocaine. Computational studies of the detailed catalytic mechanisms and the structure-and-mechanism-based computational design have been carried out through the combined use of a variety of state-of-the-art techniques of molecular modeling. By using the computational insights into the catalytic mechanisms, a recently developed unique computational design strategy based on the simulation of the rate-determining transition state has been employed to design high-activity mutants of human BChE for hydrolysis of (-)-cocaine, leading to the exciting discovery of BChE mutants with a considerably improved catalytic efficiency against (-)-cocaine. One of the discovered BChE mutants (i.e., A199S/S287G/A328W/Y332G) has a ˜456-fold improved catalytic efficiency against (-)-cocaine. The encouraging outcome of the computational design and discovery effort demonstrates that the unique computational design approach based on the transition-state simulation is promising for rational enzyme redesign and drug discovery.

  11. Real-time computer treatment of THz passive device images with the high image quality

    NASA Astrophysics Data System (ADS)

    Trofimov, Vyacheslav A.; Trofimov, Vladislav V.

    2012-06-01

    We demonstrate real-time computer code improving significantly the quality of images captured by the passive THz imaging system. The code is not only designed for a THz passive device: it can be applied to any kind of such devices and active THz imaging systems as well. We applied our code for computer processing of images captured by four passive THz imaging devices manufactured by different companies. It should be stressed that computer processing of images produced by different companies requires using the different spatial filters usually. The performance of current version of the computer code is greater than one image per second for a THz image having more than 5000 pixels and 24 bit number representation. Processing of THz single image produces about 20 images simultaneously corresponding to various spatial filters. The computer code allows increasing the number of pixels for processed images without noticeable reduction of image quality. The performance of the computer code can be increased many times using parallel algorithms for processing the image. We develop original spatial filters which allow one to see objects with sizes less than 2 cm. The imagery is produced by passive THz imaging devices which captured the images of objects hidden under opaque clothes. For images with high noise we develop an approach which results in suppression of the noise after using the computer processing and we obtain the good quality image. With the aim of illustrating the efficiency of the developed approach we demonstrate the detection of the liquid explosive, ordinary explosive, knife, pistol, metal plate, CD, ceramics, chocolate and other objects hidden under opaque clothes. The results demonstrate the high efficiency of our approach for the detection of hidden objects and they are a very promising solution for the security problem.

  12. Hybrid RANS-LES using high order numerical methods

    NASA Astrophysics Data System (ADS)

    Henry de Frahan, Marc; Yellapantula, Shashank; Vijayakumar, Ganesh; Knaus, Robert; Sprague, Michael

    2017-11-01

    Understanding the impact of wind turbine wake dynamics on downstream turbines is particularly important for the design of efficient wind farms. Due to their tractable computational cost, hybrid RANS/LES models are an attractive framework for simulating separation flows such as the wake dynamics behind a wind turbine. High-order numerical methods can be computationally efficient and provide increased accuracy in simulating complex flows. In the context of LES, high-order numerical methods have shown some success in predictions of turbulent flows. However, the specifics of hybrid RANS-LES models, including the transition region between both modeling frameworks, pose unique challenges for high-order numerical methods. In this work, we study the effect of increasing the order of accuracy of the numerical scheme in simulations of canonical turbulent flows using RANS, LES, and hybrid RANS-LES models. We describe the interactions between filtering, model transition, and order of accuracy and their effect on turbulence quantities such as kinetic energy spectra, boundary layer evolution, and dissipation rate. This work was funded by the U.S. Department of Energy, Exascale Computing Project, under Contract No. DE-AC36-08-GO28308 with the National Renewable Energy Laboratory.

  13. An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions.

    PubMed

    Kundu, Kousik; Backofen, Rolf

    2017-01-01

    Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.

  14. Progress of High Efficiency Centrifugal Compressor Simulations Using TURBO

    NASA Technical Reports Server (NTRS)

    Kulkarni, Sameer; Beach, Timothy A.

    2017-01-01

    Three-dimensional, time-accurate, and phase-lagged computational fluid dynamics (CFD) simulations of the High Efficiency Centrifugal Compressor (HECC) stage were generated using the TURBO solver. Changes to the TURBO Parallel Version 4 source code were made in order to properly model the no-slip boundary condition along the spinning hub region for centrifugal impellers. A startup procedure was developed to generate a converged flow field in TURBO. This procedure initialized computations on a coarsened mesh generated by the Turbomachinery Gridding System (TGS) and relied on a method of systematically increasing wheel speed and backpressure. Baseline design-speed TURBO results generally overpredicted total pressure ratio, adiabatic efficiency, and the choking flow rate of the HECC stage as compared with the design-intent CFD results of Code Leo. Including diffuser fillet geometry in the TURBO computation resulted in a 0.6 percent reduction in the choking flow rate and led to a better match with design-intent CFD. Diffuser fillets reduced annulus cross-sectional area but also reduced corner separation, and thus blockage, in the diffuser passage. It was found that the TURBO computations are somewhat insensitive to inlet total pressure changing from the TURBO default inlet pressure of 14.7 pounds per square inch (101.35 kilopascals) down to 11.0 pounds per square inch (75.83 kilopascals), the inlet pressure of the component test. Off-design tip clearance was modeled in TURBO in two computations: one in which the blade tip geometry was trimmed by 12 mils (0.3048 millimeters), and another in which the hub flow path was moved to reflect a 12-mil axial shift in the impeller hub, creating a step at the hub. The one-dimensional results of these two computations indicate non-negligible differences between the two modeling approaches.

  15. Beyond Moore’s technologies: operation principles of a superconductor alternative

    PubMed Central

    Klenov, Nikolay V; Bakurskiy, Sergey V; Kupriyanov, Mikhail Yu; Gudkov, Alexander L; Sidorenko, Anatoli S

    2017-01-01

    The predictions of Moore’s law are considered by experts to be valid until 2020 giving rise to “post-Moore’s” technologies afterwards. Energy efficiency is one of the major challenges in high-performance computing that should be answered. Superconductor digital technology is a promising post-Moore’s alternative for the development of supercomputers. In this paper, we consider operation principles of an energy-efficient superconductor logic and memory circuits with a short retrospective review of their evolution. We analyze their shortcomings in respect to computer circuits design. Possible ways of further research are outlined. PMID:29354341

  16. Computer-aided design studies of the homopolar linear synchronous motor

    NASA Astrophysics Data System (ADS)

    Dawson, G. E.; Eastham, A. R.; Ong, R.

    1984-09-01

    The linear induction motor (LIM), as an urban transit drive, can provide good grade-climbing capabilities and propulsion/braking performance that is independent of steel wheel-rail adhesion. In view of its 10-12 mm airgap, the LIM is characterized by a low power factor-efficiency product of order 0.4. A synchronous machine offers high efficiency and controllable power factor. An assessment of the linear homopolar configuration of this machine is presented as an alternative to the LIM. Computer-aided design studies using the finite element technique have been conducted to identify a suitable machine design for urban transit propulsion.

  17. Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

    NASA Technical Reports Server (NTRS)

    Moitra, Stuti; Gatski, Thomas B.

    1997-01-01

    A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

  18. Monitoring techniques and alarm procedures for CMS services and sites in WLCG

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Molina-Perez, J.; Bonacorsi, D.; Gutsche, O.

    2012-01-01

    The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS, the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operatingmore » worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.« less

  19. Hierarchical layered and semantic-based image segmentation using ergodicity map

    NASA Astrophysics Data System (ADS)

    Yadegar, Jacob; Liu, Xiaoqing

    2010-04-01

    Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.

  20. Nonperturbative methods in HZE ion transport

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Badavi, Francis F.; Costen, Robert C.; Shinn, Judy L.

    1993-01-01

    A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport. The code is established to operate on the Langley Research Center nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code is highly efficient and compares well with the perturbation approximations.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grote, D. P.

    Forthon generates links between Fortran and Python. Python is a high level, object oriented, interactive and scripting language that allows a flexible and versatile interface to computational tools. The Forthon package generates the necessary wrapping code which allows access to the Fortran database and to the Fortran subroutines and functions. This provides a development package where the computationally intensive parts of a code can be written in efficient Fortran, and the high level controlling code can be written in the much more versatile Python language.

  2. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

    PubMed Central

    Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

    2014-01-01

    The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545

  3. Distributed Computation of the knn Graph for Large High-Dimensional Point Sets

    PubMed Central

    Plaku, Erion; Kavraki, Lydia E.

    2009-01-01

    High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318

  4. Benchmark Lisp And Ada Programs

    NASA Technical Reports Server (NTRS)

    Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.

    1992-01-01

    Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.

  5. Implementing Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle Particle-Mesh

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J

    The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less

  6. Lecture notes in economics and mathematical system. Volume 150: Supercritical wing sections 3

    NASA Technical Reports Server (NTRS)

    Bauer, F.; Garabedian, P.; Korn, D.

    1977-01-01

    Application of computational fluid dynamics to the design and analysis of supercritical wing sections is discussed. Computer programs used to study the flight of modern aircraft at high subsonic speeds are listed and described. The cascades of shockless transonic airfoils that are expected to increase the efficiency of compressors and turbines are included.

  7. Effects of Computer Skill on Mouse Move and Click Performance

    ERIC Educational Resources Information Center

    Panagiotakopoulos, Chris; Sarris, Menelaos

    2008-01-01

    This study focuses on the use of computers in the field of education. It reports a series of experimental mouse move and click tasks on constant and moving stimuli. These experiments attempt to explore the efficiency with which individuals of different skill level and age group perform using a mouse. Differences in performance between high-skill…

  8. Efficient High-Pressure State Equations

    NASA Technical Reports Server (NTRS)

    Harstad, Kenneth G.; Miller, Richard S.; Bellan, Josette

    1997-01-01

    A method is presented for a relatively accurate, noniterative, computationally efficient calculation of high-pressure fluid-mixture equations of state, especially targeted to gas turbines and rocket engines. Pressures above I bar and temperatures above 100 K are addressed The method is based on curve fitting an effective reference state relative to departure functions formed using the Peng-Robinson cubic state equation Fit parameters for H2, O2, N2, propane, methane, n-heptane, and methanol are given.

  9. Tantalum Sulfide Nanosheets as a Theranostic Nanoplatform for Computed Tomography Imaging-Guided Combinatorial Chemo-Photothermal Therapy.

    PubMed

    Liu, Yanlan; Ji, Xiaoyuan; Liu, Jianhua; Tong, Winnie W L; Askhatova, Diana; Shi, Jinjun

    2017-10-19

    Near-infrared (NIR)-absorbing metal-based nanomaterials have shown tremendous potential for cancer therapy, given their facile and controllable synthesis, efficient photothermal conversion, capability of spatiotemporal-controlled drug delivery, and intrinsic imaging function. Tantalum (Ta) is among the most biocompatible metals and arouses negligible adverse biological responses in either oxidized or reduced forms, and thus Ta-derived nanomaterials represent promising candidates for biomedical applications. However, Ta-based nanomaterials by themselves have not been explored for NIR-mediated photothermal ablation therapy. In this work, we report an innovative Ta-based multifunctional nanoplatform composed of biocompatible tantalum sulfide (TaS 2 ) nanosheets (NSs) for simultaneous NIR hyperthermia, drug delivery, and computed tomography (CT) imaging. The TaS 2 NSs exhibit multiple unique features including (i) efficient NIR light-to-heat conversion with a high photothermal conversion efficiency of 39%. (ii) high drug loading (177% by weight), (iii) controlled drug release triggered by NIR light and moderate acidic pH, (iv) high tumor accumulation via heat-enhanced tumor vascular permeability, (v) complete tumor ablation and negligible side effects, and (vi) comparable CT imaging contrast efficiency to the widely clinically used agent iobitridol. We expect that this multifunctional NS platform can serve as a promising candidate for imaging-guided cancer therapy and selection of cancer patients with high tumor accumulation.

  10. Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study.

    PubMed

    Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

    2017-03-28

    Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.

  11. Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study

    PubMed Central

    Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

    2017-01-01

    Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358

  12. Visual saliency-based fast intracoding algorithm for high efficiency video coding

    NASA Astrophysics Data System (ADS)

    Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

    2017-01-01

    Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.

  13. Probabilistic power flow using improved Monte Carlo simulation method with correlated wind sources

    NASA Astrophysics Data System (ADS)

    Bie, Pei; Zhang, Buhan; Li, Hang; Deng, Weisi; Wu, Jiasi

    2017-01-01

    Probabilistic Power Flow (PPF) is a very useful tool for power system steady-state analysis. However, the correlation among different random injection power (like wind power) brings great difficulties to calculate PPF. Monte Carlo simulation (MCS) and analytical methods are two commonly used methods to solve PPF. MCS has high accuracy but is very time consuming. Analytical method like cumulants method (CM) has high computing efficiency but the cumulants calculating is not convenient when wind power output does not obey any typical distribution, especially when correlated wind sources are considered. In this paper, an Improved Monte Carlo simulation method (IMCS) is proposed. The joint empirical distribution is applied to model different wind power output. This method combines the advantages of both MCS and analytical method. It not only has high computing efficiency, but also can provide solutions with enough accuracy, which is very suitable for on-line analysis.

  14. Modeling Methodologies for Design and Control of Solid Oxide Fuel Cell APUs

    NASA Astrophysics Data System (ADS)

    Pianese, C.; Sorrentino, M.

    2009-08-01

    Among the existing fuel cell technologies, Solid Oxide Fuel Cells (SOFC) are particularly suitable for both stationary and mobile applications, due to their high energy conversion efficiencies, modularity, high fuel flexibility, low emissions and noise. Moreover, the high working temperatures enable their use for efficient cogeneration applications. SOFCs are entering in a pre-industrial era and a strong interest for designing tools has growth in the last years. Optimal system configuration, components sizing, control and diagnostic system design require computational tools that meet the conflicting needs of accuracy, affordable computational time, limited experimental efforts and flexibility. The paper gives an overview on control-oriented modeling of SOFC at both single cell and stack level. Such an approach provides useful simulation tools for designing and controlling SOFC-APUs destined to a wide application area, ranging from automotive to marine and airplane APUs.

  15. User-Defined Data Distributions in High-Level Programming Languages

    NASA Technical Reports Server (NTRS)

    Diaconescu, Roxana E.; Zima, Hans P.

    2006-01-01

    One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.

  16. Thermal and Power Challenges in High Performance Computing Systems

    NASA Astrophysics Data System (ADS)

    Natarajan, Venkat; Deshpande, Anand; Solanki, Sudarshan; Chandrasekhar, Arun

    2009-05-01

    This paper provides an overview of the thermal and power challenges in emerging high performance computing platforms. The advent of new sophisticated applications in highly diverse areas such as health, education, finance, entertainment, etc. is driving the platform and device requirements for future systems. The key ingredients of future platforms are vertically integrated (3D) die-stacked devices which provide the required performance characteristics with the associated form factor advantages. Two of the major challenges to the design of through silicon via (TSV) based 3D stacked technologies are (i) effective thermal management and (ii) efficient power delivery mechanisms. Some of the key challenges that are articulated in this paper include hot-spot superposition and intensification in a 3D stack, design/optimization of thermal through silicon vias (TTSVs), non-uniform power loading of multi-die stacks, efficient on-chip power delivery, minimization of electrical hotspots etc.

  17. Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

    DTIC Science & Technology

    2017-10-04

    Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These

  18. Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits.

    PubMed

    Ujfalussy, Balázs B; Makara, Judit K; Branco, Tiago; Lengyel, Máté

    2015-12-24

    Cortical neurons integrate thousands of synaptic inputs in their dendrites in highly nonlinear ways. It is unknown how these dendritic nonlinearities in individual cells contribute to computations at the level of neural circuits. Here, we show that dendritic nonlinearities are critical for the efficient integration of synaptic inputs in circuits performing analog computations with spiking neurons. We developed a theory that formalizes how a neuron's dendritic nonlinearity that is optimal for integrating synaptic inputs depends on the statistics of its presynaptic activity patterns. Based on their in vivo preynaptic population statistics (firing rates, membrane potential fluctuations, and correlations due to ensemble dynamics), our theory accurately predicted the responses of two different types of cortical pyramidal cells to patterned stimulation by two-photon glutamate uncaging. These results reveal a new computational principle underlying dendritic integration in cortical neurons by suggesting a functional link between cellular and systems--level properties of cortical circuits.

  19. Mobile clusters of single board computers: an option for providing resources to student projects and researchers.

    PubMed

    Baun, Christian

    2016-01-01

    Clusters usually consist of servers, workstations or personal computers as nodes. But especially for academic purposes like student projects or scientific projects, the cost for purchase and operation can be a challenge. Single board computers cannot compete with the performance or energy-efficiency of higher-value systems, but they are an option to build inexpensive cluster systems. Because of the compact design and modest energy consumption, it is possible to build clusters of single board computers in a way that they are mobile and can be easily transported by the users. This paper describes the construction of such a cluster, useful applications and the performance of the single nodes. Furthermore, the clusters' performance and energy-efficiency is analyzed by executing the High Performance Linpack benchmark with a different number of nodes and different proportion of the systems total main memory utilized.

  20. A sub-1-volt analog metal oxide memristive-based synaptic device with large conductance change for energy-efficient spike-based computing systems

    NASA Astrophysics Data System (ADS)

    Hsieh, Cheng-Chih; Roy, Anupam; Chang, Yao-Feng; Shahrjerdi, Davood; Banerjee, Sanjay K.

    2016-11-01

    Nanoscale metal oxide memristors have potential in the development of brain-inspired computing systems that are scalable and efficient. In such systems, memristors represent the native electronic analogues of the biological synapses. In this work, we show cerium oxide based bilayer memristors that are forming-free, low-voltage (˜|0.8 V|), energy-efficient (full on/off switching at ˜8 pJ with 20 ns pulses, intermediate states switching at ˜fJ), and reliable. Furthermore, pulse measurements reveal the analog nature of the memristive device; that is, it can directly be programmed to intermediate resistance states. Leveraging this finding, we demonstrate spike-timing-dependent plasticity, a spike-based Hebbian learning rule. In those experiments, the memristor exhibits a marked change in the normalized synaptic strength (>30 times), when the pre- and post-synaptic neural spikes overlap. This demonstration is an important step towards the physical construction of high density and high connectivity neural networks.

  1. Status and future prospects of using numerical methods to study complex flows at High Reynolds numbers

    NASA Technical Reports Server (NTRS)

    Maccormack, R. W.

    1978-01-01

    The calculation of flow fields past aircraft configuration at flight Reynolds numbers is considered. Progress in devising accurate and efficient numerical methods, in understanding and modeling the physics of turbulence, and in developing reliable and powerful computer hardware is discussed. Emphasis is placed on efficient solutions to the Navier-Stokes equations.

  2. A Comparison of Wavetable and FM Data Reduction Methods for Resynthesis of Musical Sounds

    NASA Astrophysics Data System (ADS)

    Horner, Andrew

    An ideal music-synthesis technique provides both high-level spectral control and efficient computation. Simple playback of recorded samples lacks spectral control, while additive sine-wave synthesis is inefficient. Wavetable and frequencymodulation synthesis, however, are two popular synthesis techniques that are very efficient and use only a few control parameters.

  3. 28 percent efficient GaAs concentrator solar cells

    NASA Technical Reports Server (NTRS)

    Macmillan, H. F.; Hamaker, H. C.; Kaminar, N. R.; Kuryla, M. S.; Ladle Ristow, M.

    1988-01-01

    AlGaAs/GaAs heteroface solar concentrator cells which exhibit efficiencies in excess of 27 percent at high solar concentrations (over 400 suns, AM1.5D, 100 mW/sq cm) have been fabricated with both n/p and p/n configurations. The best n/p cell achieved an efficiency of 28.1 percent around 400 suns, and the best p/n cell achieved an efficiency of 27.5 percent around 1000 suns. The high performance of these GaAs concentrator cells compared to earlier high-efficiency cells was due to improved control of the metal-organic chemical vapor deposition growth conditions and improved cell fabrication procedures (gridline definition and edge passivation). The design parameters of the solar cell structures and optimized grid pattern were determined with a realistic computer modeling program. An evaluation of the device characteristics and a discussion of future GaAs concentrator cell development are presented.

  4. Vectorized algorithms for spiking neural network simulation.

    PubMed

    Brette, Romain; Goodman, Dan F M

    2011-06-01

    High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.

  5. Bypassing the malfunction junction in warm dense matter simulations

    NASA Astrophysics Data System (ADS)

    Cangi, Attila; Pribram-Jones, Aurora

    2015-03-01

    Simulation of warm dense matter requires computational methods that capture both quantum and classical behavior efficiently under high-temperature and high-density conditions. The state-of-the-art approach to model electrons and ions under those conditions is density functional theory molecular dynamics, but this method's computational cost skyrockets as temperatures and densities increase. We propose finite-temperature potential functional theory as an in-principle-exact alternative that suffers no such drawback. In analogy to the zero-temperature theory developed previously, we derive an orbital-free free energy approximation through a coupling-constant formalism. Our density approximation and its associated free energy approximation demonstrate the method's accuracy and efficiency. A.C. has been partially supported by NSF Grant CHE-1112442. A.P.J. is supported by DOE Grant DE-FG02-97ER25308.

  6. Chirality Transfer in Gold(I)-Catalysed Direct Allylic Etherifications of Unactivated Alcohols: Experimental and Computational Study

    PubMed Central

    Barker, Graeme; Johnson, David G; Young, Paul C; Macgregor, Stuart A; Lee, Ai-Lan

    2015-01-01

    Gold(I)-catalysed direct allylic etherifications have been successfully carried out with chirality transfer to yield enantioenriched, γ-substituted secondary allylic ethers. Our investigations include a full substrate-scope screen to ascertain substituent effects on the regioselectivity, stereoselectivity and efficiency of chirality transfer, as well as control experiments to elucidate the mechanistic subtleties of the chirality-transfer process. Crucially, addition of molecular sieves was found to be necessary to ensure efficient and general chirality transfer. Computational studies suggest that the efficiency of chirality transfer is linked to the aggregation of the alcohol nucleophile around the reactive π-bound Au–allylic ether complex. With a single alcohol nucleophile, a high degree of chirality transfer is predicted. However, if three alcohols are present, alternative proton transfer chain mechanisms that erode the efficiency of chirality transfer become competitive. PMID:26248980

  7. EPPRD: An Efficient Privacy-Preserving Power Requirement and Distribution Aggregation Scheme for a Smart Grid.

    PubMed

    Zhang, Lei; Zhang, Jing

    2017-08-07

    A Smart Grid (SG) facilitates bidirectional demand-response communication between individual users and power providers with high computation and communication performance but also brings about the risk of leaking users' private information. Therefore, improving the individual power requirement and distribution efficiency to ensure communication reliability while preserving user privacy is a new challenge for SG. Based on this issue, we propose an efficient and privacy-preserving power requirement and distribution aggregation scheme (EPPRD) based on a hierarchical communication architecture. In the proposed scheme, an efficient encryption and authentication mechanism is proposed for better fit to each individual demand-response situation. Through extensive analysis and experiment, we demonstrate how the EPPRD resists various security threats and preserves user privacy while satisfying the individual requirement in a semi-honest model; it involves less communication overhead and computation time than the existing competing schemes.

  8. Efficient LIDAR Point Cloud Data Managing and Processing in a Hadoop-Based Distributed Framework

    NASA Astrophysics Data System (ADS)

    Wang, C.; Hu, F.; Sha, D.; Han, X.

    2017-10-01

    Light Detection and Ranging (LiDAR) is one of the most promising technologies in surveying and mapping city management, forestry, object recognition, computer vision engineer and others. However, it is challenging to efficiently storage, query and analyze the high-resolution 3D LiDAR data due to its volume and complexity. In order to improve the productivity of Lidar data processing, this study proposes a Hadoop-based framework to efficiently manage and process LiDAR data in a distributed and parallel manner, which takes advantage of Hadoop's storage and computing ability. At the same time, the Point Cloud Library (PCL), an open-source project for 2D/3D image and point cloud processing, is integrated with HDFS and MapReduce to conduct the Lidar data analysis algorithms provided by PCL in a parallel fashion. The experiment results show that the proposed framework can efficiently manage and process big LiDAR data.

  9. EPPRD: An Efficient Privacy-Preserving Power Requirement and Distribution Aggregation Scheme for a Smart Grid

    PubMed Central

    Zhang, Lei; Zhang, Jing

    2017-01-01

    A Smart Grid (SG) facilitates bidirectional demand-response communication between individual users and power providers with high computation and communication performance but also brings about the risk of leaking users’ private information. Therefore, improving the individual power requirement and distribution efficiency to ensure communication reliability while preserving user privacy is a new challenge for SG. Based on this issue, we propose an efficient and privacy-preserving power requirement and distribution aggregation scheme (EPPRD) based on a hierarchical communication architecture. In the proposed scheme, an efficient encryption and authentication mechanism is proposed for better fit to each individual demand-response situation. Through extensive analysis and experiment, we demonstrate how the EPPRD resists various security threats and preserves user privacy while satisfying the individual requirement in a semi-honest model; it involves less communication overhead and computation time than the existing competing schemes. PMID:28783122

  10. A FFT-based formulation for efficient mechanical fields computation in isotropic and anisotropic periodic discrete dislocation dynamics

    NASA Astrophysics Data System (ADS)

    Bertin, N.; Upadhyay, M. V.; Pradalier, C.; Capolungo, L.

    2015-09-01

    In this paper, we propose a novel full-field approach based on the fast Fourier transform (FFT) technique to compute mechanical fields in periodic discrete dislocation dynamics (DDD) simulations for anisotropic materials: the DDD-FFT approach. By coupling the FFT-based approach to the discrete continuous model, the present approach benefits from the high computational efficiency of the FFT algorithm, while allowing for a discrete representation of dislocation lines. It is demonstrated that the computational time associated with the new DDD-FFT approach is significantly lower than that of current DDD approaches when large number of dislocation segments are involved for isotropic and anisotropic elasticity, respectively. Furthermore, for fine Fourier grids, the treatment of anisotropic elasticity comes at a similar computational cost to that of isotropic simulation. Thus, the proposed approach paves the way towards achieving scale transition from DDD to mesoscale plasticity, especially due to the method’s ability to incorporate inhomogeneous elasticity.

  11. High performance transcription factor-DNA docking with GPU computing

    PubMed Central

    2012-01-01

    Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575

  12. Evolving binary classifiers through parallel computation of multiple fitness cases.

    PubMed

    Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

    2005-06-01

    This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.

  13. Design and function of biomimetic multilayer water purification membranes

    PubMed Central

    Ling, Shengjie; Qin, Zhao; Huang, Wenwen; Cao, Sufeng; Kaplan, David L.; Buehler, Markus J.

    2017-01-01

    Multilayer architectures in water purification membranes enable increased water throughput, high filter efficiency, and high molecular loading capacity. However, the preparation of membranes with well-organized multilayer structures, starting from the nanoscale to maximize filtration efficiency, remains a challenge. We report a complete strategy to fully realize a novel biomaterial-based multilayer nanoporous membrane via the integration of computational simulation and experimental fabrication. Our comparative computational simulations, based on coarse-grained models of protein nanofibrils and mineral plates, reveal that the multilayer structure can only form with weak interactions between nanofibrils and mineral plates. We demonstrate experimentally that silk nanofibril (SNF) and hydroxyapatite (HAP) can be used to fabricate highly ordered multilayer membranes with nanoporous features by combining protein self-assembly and in situ biomineralization. The production is optimized to be a simple and highly repeatable process that does not require sophisticated equipment and is suitable for scaled production of low-cost water purification membranes. These membranes not only show ultrafast water penetration but also exhibit broad utility and high efficiency of removal and even reuse (in some cases) of contaminants, including heavy metal ions, dyes, proteins, and other nanoparticles in water. Our biomimetic design and synthesis of these functional SNF/HAP materials have established a paradigm that could lead to the large-scale, low-cost production of multilayer materials with broad spectrum and efficiency for water purification, with applications in wastewater treatment, biomedicine, food industry, and the life sciences. PMID:28435877

  14. Design and function of biomimetic multilayer water purification membranes.

    PubMed

    Ling, Shengjie; Qin, Zhao; Huang, Wenwen; Cao, Sufeng; Kaplan, David L; Buehler, Markus J

    2017-04-01

    Multilayer architectures in water purification membranes enable increased water throughput, high filter efficiency, and high molecular loading capacity. However, the preparation of membranes with well-organized multilayer structures, starting from the nanoscale to maximize filtration efficiency, remains a challenge. We report a complete strategy to fully realize a novel biomaterial-based multilayer nanoporous membrane via the integration of computational simulation and experimental fabrication. Our comparative computational simulations, based on coarse-grained models of protein nanofibrils and mineral plates, reveal that the multilayer structure can only form with weak interactions between nanofibrils and mineral plates. We demonstrate experimentally that silk nanofibril (SNF) and hydroxyapatite (HAP) can be used to fabricate highly ordered multilayer membranes with nanoporous features by combining protein self-assembly and in situ biomineralization. The production is optimized to be a simple and highly repeatable process that does not require sophisticated equipment and is suitable for scaled production of low-cost water purification membranes. These membranes not only show ultrafast water penetration but also exhibit broad utility and high efficiency of removal and even reuse (in some cases) of contaminants, including heavy metal ions, dyes, proteins, and other nanoparticles in water. Our biomimetic design and synthesis of these functional SNF/HAP materials have established a paradigm that could lead to the large-scale, low-cost production of multilayer materials with broad spectrum and efficiency for water purification, with applications in wastewater treatment, biomedicine, food industry, and the life sciences.

  15. Accurate first-principles structures and energies of diversely bonded systems from an efficient density functional

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sun, Jianwei; Remsing, Richard C.; Zhang, Yubo

    2016-06-13

    One atom or molecule binds to another through various types of bond, the strengths of which range from several meV to several eV. Although some computational methods can provide accurate descriptions of all bond types, those methods are not efficient enough for many studies (for example, large systems, ab initio molecular dynamics and high-throughput searches for functional materials). Here, we show that the recently developed non-empirical strongly constrained and appropriately normed (SCAN) meta-generalized gradient approximation (meta-GGA) within the density functional theory framework predicts accurate geometries and energies of diversely bonded molecules and materials (including covalent, metallic, ionic, hydrogen and vanmore » der Waals bonds). This represents a significant improvement at comparable efficiency over its predecessors, the GGAs that currently dominate materials computation. Often, SCAN matches or improves on the accuracy of a computationally expensive hybrid functional, at almost-GGA cost. SCAN is therefore expected to have a broad impact on chemistry and materials science.« less

  16. Accurate first-principles structures and energies of diversely bonded systems from an efficient density functional.

    PubMed

    Sun, Jianwei; Remsing, Richard C; Zhang, Yubo; Sun, Zhaoru; Ruzsinszky, Adrienn; Peng, Haowei; Yang, Zenghui; Paul, Arpita; Waghmare, Umesh; Wu, Xifan; Klein, Michael L; Perdew, John P

    2016-09-01

    One atom or molecule binds to another through various types of bond, the strengths of which range from several meV to several eV. Although some computational methods can provide accurate descriptions of all bond types, those methods are not efficient enough for many studies (for example, large systems, ab initio molecular dynamics and high-throughput searches for functional materials). Here, we show that the recently developed non-empirical strongly constrained and appropriately normed (SCAN) meta-generalized gradient approximation (meta-GGA) within the density functional theory framework predicts accurate geometries and energies of diversely bonded molecules and materials (including covalent, metallic, ionic, hydrogen and van der Waals bonds). This represents a significant improvement at comparable efficiency over its predecessors, the GGAs that currently dominate materials computation. Often, SCAN matches or improves on the accuracy of a computationally expensive hybrid functional, at almost-GGA cost. SCAN is therefore expected to have a broad impact on chemistry and materials science.

  17. Multiresolution molecular mechanics: Implementation and efficiency

    NASA Astrophysics Data System (ADS)

    Biyikli, Emre; To, Albert C.

    2017-01-01

    Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.

  18. On the generalized VIP time integral methodology for transient thermal problems

    NASA Technical Reports Server (NTRS)

    Mei, Youping; Chen, Xiaoqin; Tamma, Kumar K.; Sha, Desong

    1993-01-01

    The paper describes the development and applicability of a generalized VIrtual-Pulse (VIP) time integral method of computation for thermal problems. Unlike past approaches for general heat transfer computations, and with the advent of high speed computing technology and the importance of parallel computations for efficient use of computing environments, a major motivation via the developments described in this paper is the need for developing explicit computational procedures with improved accuracy and stability characteristics. As a consequence, a new and effective VIP methodology is described which inherits these improved characteristics. Numerical illustrative examples are provided to demonstrate the developments and validate the results obtained for thermal problems.

  19. Civil propulsion technology for the next twenty-five years

    NASA Technical Reports Server (NTRS)

    Rosen, Robert; Facey, John R.

    1987-01-01

    The next twenty-five years will see major advances in civil propulsion technology that will result in completely new aircraft systems for domestic, international, commuter and high-speed transports. These aircraft will include advanced aerodynamic, structural, and avionic technologies resulting in major new system capabilities and economic improvements. Propulsion technologies will include high-speed turboprops in the near term, very high bypass ratio turbofans, high efficiency small engines and advanced cycles utilizing high temperature materials for high-speed propulsion. Key fundamental enabling technologies include increased temperature capability and advanced design methods. Increased temperature capability will be based on improved composite materials such as metal matrix, intermetallics, ceramics, and carbon/carbon as well as advanced heat transfer techniques. Advanced design methods will make use of advances in internal computational fluid mechanics, reacting flow computation, computational structural mechanics and computational chemistry. The combination of advanced enabling technologies, new propulsion concepts and advanced control approaches will provide major improvements in civil aircraft.

  20. Computed Flow Through An Artificial Heart Valve

    NASA Technical Reports Server (NTRS)

    Rogers, Stewart E.; Kwak, Dochan; Kiris, Cetin; Chang, I-Dee

    1994-01-01

    Report discusses computations of blood flow through prosthetic tilting disk valve. Computational procedure developed in simulation used to design better artificial hearts and valves by reducing or eliminating following adverse flow characteristics: large pressure losses, which prevent hearts from working efficiently; separated and secondary flows, which causes clotting; and high turbulent shear stresses, which damages red blood cells. Report reiterates and expands upon part of NASA technical memorandum "Computed Flow Through an Artificial Heart and Valve" (ARC-12983). Also based partly on research described in "Numerical Simulation of Flow Through an Artificial Heart" (ARC-12478).

  1. Public-Private Partnership: Joint recommendations to improve downloads of large Earth observation data

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.

    2016-12-01

    With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.

  2. Aerosol Delivery with Two Nebulizers Through High-Flow Nasal Cannula: A Randomized Cross-Over Single-Photon Emission Computed Tomography-Computed Tomography Study.

    PubMed

    Dugernier, Jonathan; Hesse, Michel; Jumetz, Thibaud; Bialais, Emilie; Roeseler, Jean; Depoortere, Virginie; Michotte, Jean-Bernard; Wittebole, Xavier; Ehrmann, Stephan; Laterre, Pierre-François; Jamar, François; Reychler, Gregory

    2017-10-01

    High-flow nasal cannula use is developing in ICUs. The aim of this study was to compare aerosol efficiency by using two nebulizers through a high-flow nasal cannula: the most commonly used jet nebulizer (JN) and a more efficient vibrating-mesh nebulizer (VN). Aerosol delivery of diethylenetriaminepentaacetic acid labeled with technetium-99m (4 mCi/4 mL) to the lungs by using a VN (Aerogen Solo ® ; Aerogen Ltd., Galway, Ireland) and a constant-output JN (Opti-Mist Plus Nebulizer ® ; ConvaTec, Bridgewater, NJ) through a high-flow nasal cannula (Optiflow ® ; Fisher & Paykel, New Zealand) was compared in six healthy subjects. Flow rate was set at 30 L/min through the heated humidified circuit. Pulmonary and extrapulmonary deposition was measured by single-photon emission computed tomography combined with a low-dose computed tomographic scan and by planar scintigraphy. Lung deposition was only 3.6 (2.1-4.4) and 1 (0.7-2)% of the nominal dose with the VN and the JN, respectively (p < 0.05). The JN showed higher retained doses than the VN. However, both nebulizers were associated with substantial deposition in the single limb circuit, the humidification chamber, and the nasal cannula [58.2 (51.6-61.6)% of the nominal dose with the VN versus 19.2 (15.8-22.9)% of the nominal dose with the JN, p < 0.05] and in the upper respiratory tract [17.6 (13.4-27.9)% of the nominal dose with the VN and 8.6 (6.0-11.0)% of the nominal dose with the JN, p < 0.05], especially in the nasal cavity. In the specific conditions of the study, pulmonary drug delivery through the high-flow nasal cannula is about 1%-4% of the initial amount of drugs placed in the nebulizer, despite the higher efficiency of the VN as compared with the JN.

  3. Time-dependent jet flow and noise computations

    NASA Technical Reports Server (NTRS)

    Berman, C. H.; Ramos, J. I.; Karniadakis, G. E.; Orszag, S. A.

    1990-01-01

    Methods for computing jet turbulence noise based on the time-dependent solution of Lighthill's (1952) differential equation are demonstrated. A key element in this approach is a flow code for solving the time-dependent Navier-Stokes equations at relatively high Reynolds numbers. Jet flow results at Re = 10,000 are presented here. This code combines a computationally efficient spectral element technique and a new self-consistent turbulence subgrid model to supply values for Lighthill's turbulence noise source tensor.

  4. Fast approximation for joint optimization of segmentation, shape, and location priors, and its application in gallbladder segmentation.

    PubMed

    Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu

    2017-05-01

    This paper addresses joint optimization for segmentation and shape priors, including translation, to overcome inter-subject variability in the location of an organ. Because a simple extension of the previous exact optimization method is too computationally complex, we propose a fast approximation for optimization. The effectiveness of the proposed approximation is validated in the context of gallbladder segmentation from a non-contrast computed tomography (CT) volume. After spatial standardization and estimation of the posterior probability of the target organ, simultaneous optimization of the segmentation, shape, and location priors is performed using a branch-and-bound method. Fast approximation is achieved by combining sampling in the eigenshape space to reduce the number of shape priors and an efficient computational technique for evaluating the lower bound. Performance was evaluated using threefold cross-validation of 27 CT volumes. Optimization in terms of translation of the shape prior significantly improved segmentation performance. The proposed method achieved a result of 0.623 on the Jaccard index in gallbladder segmentation, which is comparable to that of state-of-the-art methods. The computational efficiency of the algorithm is confirmed to be good enough to allow execution on a personal computer. Joint optimization of the segmentation, shape, and location priors was proposed, and it proved to be effective in gallbladder segmentation with high computational efficiency.

  5. Scaling predictive modeling in drug development with cloud computing.

    PubMed

    Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola

    2015-01-26

    Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.

  6. Some Observations on the Current Status of Performing Finite Element Analyses

    NASA Technical Reports Server (NTRS)

    Raju, Ivatury S.; Knight, Norman F., Jr; Shivakumar, Kunigal N.

    2015-01-01

    Aerospace structures are complex high-performance structures. Advances in reliable and efficient computing and modeling tools are enabling analysts to consider complex configurations, build complex finite element models, and perform analysis rapidly. Many of the early career engineers of today are very proficient in the usage of modern computers, computing engines, complex software systems, and visualization tools. These young engineers are becoming increasingly efficient in building complex 3D models of complicated aerospace components. However, the current trends demonstrate blind acceptance of the results of the finite element analysis results. This paper is aimed at raising an awareness of this situation. Examples of the common encounters are presented. To overcome the current trends, some guidelines and suggestions for analysts, senior engineers, and educators are offered.

  7. Aeroelastic Model Structure Computation for Envelope Expansion

    NASA Technical Reports Server (NTRS)

    Kukreja, Sunil L.

    2007-01-01

    Structure detection is a procedure for selecting a subset of candidate terms, from a full model description, that best describes the observed output. This is a necessary procedure to compute an efficient system description which may afford greater insight into the functionality of the system or a simpler controller design. Structure computation as a tool for black-box modelling may be of critical importance in the development of robust, parsimonious models for the flight-test community. Moreover, this approach may lead to efficient strategies for rapid envelope expansion which may save significant development time and costs. In this study, a least absolute shrinkage and selection operator (LASSO) technique is investigated for computing efficient model descriptions of nonlinear aeroelastic systems. The LASSO minimises the residual sum of squares by the addition of an l(sub 1) penalty term on the parameter vector of the traditional 2 minimisation problem. Its use for structure detection is a natural extension of this constrained minimisation approach to pseudolinear regression problems which produces some model parameters that are exactly zero and, therefore, yields a parsimonious system description. Applicability of this technique for model structure computation for the F/A-18 Active Aeroelastic Wing using flight test data is shown for several flight conditions (Mach numbers) by identifying a parsimonious system description with a high percent fit for cross-validated data.

  8. An adaptive discontinuous Galerkin solver for aerodynamic flows

    NASA Astrophysics Data System (ADS)

    Burgess, Nicholas K.

    This work considers the accuracy, efficiency, and robustness of an unstructured high-order accurate discontinuous Galerkin (DG) solver for computational fluid dynamics (CFD). Recently, there has been a drive to reduce the discretization error of CFD simulations using high-order methods on unstructured grids. However, high-order methods are often criticized for lacking robustness and having high computational cost. The goal of this work is to investigate methods that enhance the robustness of high-order discontinuous Galerkin (DG) methods on unstructured meshes, while maintaining low computational cost and high accuracy of the numerical solutions. This work investigates robustness enhancement of high-order methods by examining effective non-linear solvers, shock capturing methods, turbulence model discretizations and adaptive refinement techniques. The goal is to develop an all encompassing solver that can simulate a large range of physical phenomena, where all aspects of the solver work together to achieve a robust, efficient and accurate solution strategy. The components and framework for a robust high-order accurate solver that is capable of solving viscous, Reynolds Averaged Navier-Stokes (RANS) and shocked flows is presented. In particular, this work discusses robust discretizations of the turbulence model equation used to close the RANS equations, as well as stable shock capturing strategies that are applicable across a wide range of discretization orders and applicable to very strong shock waves. Furthermore, refinement techniques are considered as both efficiency and robustness enhancement strategies. Additionally, efficient non-linear solvers based on multigrid and Krylov subspace methods are presented. The accuracy, efficiency, and robustness of the solver is demonstrated using a variety of challenging aerodynamic test problems, which include turbulent high-lift and viscous hypersonic flows. Adaptive mesh refinement was found to play a critical role in obtaining a robust and efficient high-order accurate flow solver. A goal-oriented error estimation technique has been developed to estimate the discretization error of simulation outputs. For high-order discretizations, it is shown that functional output error super-convergence can be obtained, provided the discretization satisfies a property known as dual consistency. The dual consistency of the DG methods developed in this work is shown via mathematical analysis and numerical experimentation. Goal-oriented error estimation is also used to drive an hp-adaptive mesh refinement strategy, where a combination of mesh or h-refinement, and order or p-enrichment, is employed based on the smoothness of the solution. The results demonstrate that the combination of goal-oriented error estimation and hp-adaptation yield superior accuracy, as well as enhanced robustness and efficiency for a variety of aerodynamic flows including flows with strong shock waves. This work demonstrates that DG discretizations can be the basis of an accurate, efficient, and robust CFD solver. Furthermore, enhancing the robustness of DG methods does not adversely impact the accuracy or efficiency of the solver for challenging and complex flow problems. In particular, when considering the computation of shocked flows, this work demonstrates that the available shock capturing techniques are sufficiently accurate and robust, particularly when used in conjunction with adaptive mesh refinement . This work also demonstrates that robust solutions of the Reynolds Averaged Navier-Stokes (RANS) and turbulence model equations can be obtained for complex and challenging aerodynamic flows. In this context, the most robust strategy was determined to be a low-order turbulence model discretization coupled to a high-order discretization of the RANS equations. Although RANS solutions using high-order accurate discretizations of the turbulence model were obtained, the behavior of current-day RANS turbulence models discretized to high-order was found to be problematic, leading to solver robustness issues. This suggests that future work is warranted in the area of turbulence model formulation for use with high-order discretizations. Alternately, the use of Large-Eddy Simulation (LES) subgrid scale models with high-order DG methods offers the potential to leverage the high accuracy of these methods for very high fidelity turbulent simulations. This thesis has developed the algorithmic improvements that will lay the foundation for the development of a three-dimensional high-order flow solution strategy that can be used as the basis for future LES simulations.

  9. Money for Research, Not for Energy Bills: Finding Energy and Cost Savings in High Performance Computer Facility Designs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Drewmark Communications; Sartor, Dale; Wilson, Mark

    2010-07-01

    High-performance computing facilities in the United States consume an enormous amount of electricity, cutting into research budgets and challenging public- and private-sector efforts to reduce energy consumption and meet environmental goals. However, these facilities can greatly reduce their energy demand through energy-efficient design of the facility itself. Using a case study of a facility under design, this article discusses strategies and technologies that can be used to help achieve energy reductions.

  10. Approximate Green's function methods for HZE transport in multilayered materials

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Badavi, Francis F.; Shinn, Judy L.; Costen, Robert C.

    1993-01-01

    A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport in multilayered materials. The code is established to operate on the Langley nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code was found to be highly efficient and compared well with the perturbation approximation.

  11. A new approach to integrate GPU-based Monte Carlo simulation into inverse treatment plan optimization for proton therapy.

    PubMed

    Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun

    2017-01-07

    Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6  ±  15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size.

  12. A new approach to integrate GPU-based Monte Carlo simulation into inverse treatment plan optimization for proton therapy

    NASA Astrophysics Data System (ADS)

    Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun

    2017-01-01

    Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6  ±  15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size.

  13. A New Approach to Integrate GPU-based Monte Carlo Simulation into Inverse Treatment Plan Optimization for Proton Therapy

    PubMed Central

    Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun

    2016-01-01

    Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6±15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size. PMID:27991456

  14. Computational design of high efficiency release targets for use at ISOL facilities

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Alton, G. D.; Middleton, J. W.

    1999-06-01

    This report describes efforts made at the Oak Ridge National Laboratory to design high-efficiency-release targets that simultaneously incorporate the short diffusion lengths, high permeabilities, controllable temperatures, and heat removal properties required for the generation of useful radioactive ion beam (RIB) intensities for nuclear physics and astrophysics research using the isotope separation on-line (ISOL) technique. Short diffusion lengths are achieved either by using thin fibrous target materials or by coating thin layers of selected target material onto low-density carbon fibers such as reticulated vitreous carbon fiber (RVCF) or carbon-bonded-carbon-fiber (CBCF) to form highly permeable composite target matrices. Computational studies which simulate the generation and removal of primary beam deposited heat from target materials have been conducted to optimize the design of target/heat-sink systems for generating RIBs. The results derived from diffusion release-rate simulation studies for selected targets and thermal analyses of temperature distributions within a prototype target/heat-sink system subjected to primary ion beam irradiation will be presented in this report.

  15. High-efficiency-release targets for use at ISOL facilities: computational design

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Alton, G. D.

    1999-12-01

    This report describes efforts made at the Oak Ridge National Laboratory to design high-efficiency-release targets that simultaneously incorporate the short diffusion lengths, high permeabilities, controllable temperatures, and heat-removal properties required for the generation of useful radioactive ion beam (RIB) intensities for nuclear physics and astrophysics research using the isotope separation on-line (ISOL) technique. Short diffusion lengths are achieved either by using thin fibrous target materials or by coating thin layers of selected target material onto low-density carbon fibers such as reticulated-vitreous-carbon fiber (RVCF) or carbon-bonded-carbon fiber (CBCF) to form highly permeable composite target matrices. Computational studies that simulate the generation and removal of primary beam deposited heat from target materials have been conducted to optimize the design of target/heat-sink systems for generating RIBs. The results derived from diffusion release-rate simulation studies for selected targets and thermal analyses of temperature distributions within a prototype target/heat-sink system subjected to primary ion beam irradiation are presented in this report.

  16. Quantum Monte Carlo for large chemical systems: implementing efficient strategies for petascale platforms and beyond.

    PubMed

    Scemama, Anthony; Caffarel, Michel; Oseret, Emmanuel; Jalby, William

    2013-04-30

    Various strategies to implement efficiently quantum Monte Carlo (QMC) simulations for large chemical systems are presented. These include: (i) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), (ii) the possibility of keeping the memory footprint minimal, (iii) the important enhancement of single-core performance when efficient optimization tools are used, and (iv) the definition of a universal, dynamic, fault-tolerant, and load-balanced framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056, and 1731 electrons). Using 10-80 k computing cores of the Curie machine (GENCI-TGCC-CEA, France), QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible. Copyright © 2013 Wiley Periodicals, Inc.

  17. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    NASA Astrophysics Data System (ADS)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  18. The Effect of Computer Automation on Institutional Review Board (IRB) Office Efficiency

    ERIC Educational Resources Information Center

    Oder, Karl; Pittman, Stephanie

    2015-01-01

    Companies purchase computer systems to make their processes more efficient through automation. Some academic medical centers (AMC) have purchased computer systems for their institutional review boards (IRB) to increase efficiency and compliance with regulations. IRB computer systems are expensive to purchase, deploy, and maintain. An AMC should…

  19. Efficient Wideband Numerical Simulations for Nanostructures Employing a Drude-Critical Points (DCP) Dispersive Model.

    PubMed

    Ren, Qiang; Nagar, Jogender; Kang, Lei; Bian, Yusheng; Werner, Ping; Werner, Douglas H

    2017-05-18

    A highly efficient numerical approach for simulating the wideband optical response of nano-architectures comprised of Drude-Critical Points (DCP) media (e.g., gold and silver) is proposed and validated through comparing with commercial computational software. The kernel of this algorithm is the subdomain level discontinuous Galerkin time domain (DGTD) method, which can be viewed as a hybrid of the spectral-element time-domain method (SETD) and the finite-element time-domain (FETD) method. An hp-refinement technique is applied to decrease the Degrees-of-Freedom (DoFs) and computational requirements. The collocated E-J scheme facilitates solving the auxiliary equations by converting the inversions of matrices to simpler vector manipulations. A new hybrid time stepping approach, which couples the Runge-Kutta and Newmark methods, is proposed to solve the temporal auxiliary differential equations (ADEs) with a high degree of efficiency. The advantages of this new approach, in terms of computational resource overhead and accuracy, are validated through comparison with well-known commercial software for three diverse cases, which cover both near-field and far-field properties with plane wave and lumped port sources. The presented work provides the missing link between DCP dispersive models and FETD and/or SETD based algorithms. It is a competitive candidate for numerically studying the wideband plasmonic properties of DCP media.

  20. Increasing the computational efficient of digital cross correlation by a vectorization method

    NASA Astrophysics Data System (ADS)

    Chang, Ching-Yuan; Ma, Chien-Ching

    2017-08-01

    This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.

  1. Efficient computation of electrograms and ECGs in human whole heart simulations using a reaction-eikonal model.

    PubMed

    Neic, Aurel; Campos, Fernando O; Prassl, Anton J; Niederer, Steven A; Bishop, Martin J; Vigmond, Edward J; Plank, Gernot

    2017-10-01

    Anatomically accurate and biophysically detailed bidomain models of the human heart have proven a powerful tool for gaining quantitative insight into the links between electrical sources in the myocardium and the concomitant current flow in the surrounding medium as they represent their relationship mechanistically based on first principles. Such models are increasingly considered as a clinical research tool with the perspective of being used, ultimately, as a complementary diagnostic modality. An important prerequisite in many clinical modeling applications is the ability of models to faithfully replicate potential maps and electrograms recorded from a given patient. However, while the personalization of electrophysiology models based on the gold standard bidomain formulation is in principle feasible, the associated computational expenses are significant, rendering their use incompatible with clinical time frames. In this study we report on the development of a novel computationally efficient reaction-eikonal (R-E) model for modeling extracellular potential maps and electrograms. Using a biventricular human electrophysiology model, which incorporates a topologically realistic His-Purkinje system (HPS), we demonstrate by comparing against a high-resolution reaction-diffusion (R-D) bidomain model that the R-E model predicts extracellular potential fields, electrograms as well as ECGs at the body surface with high fidelity and offers vast computational savings greater than three orders of magnitude. Due to their efficiency R-E models are ideally suitable for forward simulations in clinical modeling studies which attempt to personalize electrophysiological model features.

  2. Efficient computation of electrograms and ECGs in human whole heart simulations using a reaction-eikonal model

    NASA Astrophysics Data System (ADS)

    Neic, Aurel; Campos, Fernando O.; Prassl, Anton J.; Niederer, Steven A.; Bishop, Martin J.; Vigmond, Edward J.; Plank, Gernot

    2017-10-01

    Anatomically accurate and biophysically detailed bidomain models of the human heart have proven a powerful tool for gaining quantitative insight into the links between electrical sources in the myocardium and the concomitant current flow in the surrounding medium as they represent their relationship mechanistically based on first principles. Such models are increasingly considered as a clinical research tool with the perspective of being used, ultimately, as a complementary diagnostic modality. An important prerequisite in many clinical modeling applications is the ability of models to faithfully replicate potential maps and electrograms recorded from a given patient. However, while the personalization of electrophysiology models based on the gold standard bidomain formulation is in principle feasible, the associated computational expenses are significant, rendering their use incompatible with clinical time frames. In this study we report on the development of a novel computationally efficient reaction-eikonal (R-E) model for modeling extracellular potential maps and electrograms. Using a biventricular human electrophysiology model, which incorporates a topologically realistic His-Purkinje system (HPS), we demonstrate by comparing against a high-resolution reaction-diffusion (R-D) bidomain model that the R-E model predicts extracellular potential fields, electrograms as well as ECGs at the body surface with high fidelity and offers vast computational savings greater than three orders of magnitude. Due to their efficiency R-E models are ideally suitable for forward simulations in clinical modeling studies which attempt to personalize electrophysiological model features.

  3. Analysis of multigrid methods on massively parallel computers: Architectural implications

    NASA Technical Reports Server (NTRS)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  4. Computer Model Of Fragmentation Of Atomic Nuclei

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Townsend, Lawrence W.; Tripathi, Ram K.; Norbury, John W.; KHAN FERDOUS; Badavi, Francis F.

    1995-01-01

    High Charge and Energy Semiempirical Nuclear Fragmentation Model (HZEFRG1) computer program developed to be computationally efficient, user-friendly, physics-based program for generating data bases on fragmentation of atomic nuclei. Data bases generated used in calculations pertaining to such radiation-transport applications as shielding against radiation in outer space, radiation dosimetry in outer space, cancer therapy in laboratories with beams of heavy ions, and simulation studies for designing detectors for experiments in nuclear physics. Provides cross sections for production of individual elements and isotopes in breakups of high-energy heavy ions by combined nuclear and Coulomb fields of interacting nuclei. Written in ANSI FORTRAN 77.

  5. Properties of high quality GaP single crystals grown by computer controlled liquid encapsulated Czochralski technique

    NASA Astrophysics Data System (ADS)

    Kokubun, Y.; Washizuka, S.; Ushizawa, J.; Watanabe, M.; Fukuda, T.

    1982-11-01

    The properties of GaP single crystals grown by an automatically diameter controlled liquid encapsulated Czochralski technique using a computer have been studied. A dislocation density less than 5×104 cm-2 has been observed for crystal grown in a temperature gradient lower than 70 °C/cm near the solid-liquid interface. Crystals have about 10% higher electron mobility than that of commercially available coracle controlled crystals and have 0.2˜0.5 compensation ratios. Yellow light emitting diodes using computer controlled (100) substrates have shown extremely high external quantum efficiency of 0.3%.

  6. [Axial computer tomography of the neurocranium (author's transl)].

    PubMed

    Stöppler, L

    1977-05-27

    Computer tomography (CT), a new radiographic examination technique, is very highly efficient, for it has high informative content with little stress for the patient. In contrast to the conventional X-ray technology, CT succeeds, by direct presentation of the structure of the soft parts, in obtaining information which comes close to that of macroscopic neuropathology. The capacity and limitations of the method at the present stage of development are reported. Computer tomography cannot displace conventional neuroradiological methods of investigation, although it is rightly presented as a screening method and helps towards selective use. Indications, technical integration and handling of CT are prerequisites for the exhaustive benefit of the excellent new technique.

  7. Efficiency of a flapping propulsion system based on two side-by-side pitching foils

    NASA Astrophysics Data System (ADS)

    Huera-Huarte, Francisco

    2017-11-01

    We explore the propulsive performance of two foils flapping side-by-side in a wide variety of configurations, for different foil separations, pitching amplitudes and frequencies and phase differences. Direct force and torque measurements will be shown in each situation, after a thorough parametric study, that led to the identification of highly efficient modes of propulsion. The especially designed experimental rig allowed the computation of efficiencies globally and at each shaft in the system. Planar and volumetric Particle Image Velocimetry (PIV) allowed a detailed description of the wake generated by the system, for each different kinematics investigated. The investigation is part of an ambitious project with the aim of producing a high efficient and highly manoeuvrable flapping propulsion system for underwater vehicles. Funding from Spanish Ministry MINECO through Grant DPI2015-71645-P is gratefully acknowledged.

  8. A hydrological emulator for global applications - HE v1.0.0

    NASA Astrophysics Data System (ADS)

    Liu, Yaling; Hejazi, Mohamad; Li, Hongyi; Zhang, Xuesong; Leng, Guoyong

    2018-03-01

    While global hydrological models (GHMs) are very useful in exploring water resources and interactions between the Earth and human systems, their use often requires numerous model inputs, complex model calibration, and high computation costs. To overcome these challenges, we construct an efficient open-source and ready-to-use hydrological emulator (HE) that can mimic complex GHMs at a range of spatial scales (e.g., basin, region, globe). More specifically, we construct both a lumped and a distributed scheme of the HE based on the monthly abcd model to explore the tradeoff between computational cost and model fidelity. Model predictability and computational efficiency are evaluated in simulating global runoff from 1971 to 2010 with both the lumped and distributed schemes. The results are compared against the runoff product from the widely used Variable Infiltration Capacity (VIC) model. Our evaluation indicates that the lumped and distributed schemes present comparable results regarding annual total quantity, spatial pattern, and temporal variation of the major water fluxes (e.g., total runoff, evapotranspiration) across the global 235 basins (e.g., correlation coefficient r between the annual total runoff from either of these two schemes and the VIC is > 0.96), except for several cold (e.g., Arctic, interior Tibet), dry (e.g., North Africa) and mountainous (e.g., Argentina) regions. Compared against the monthly total runoff product from the VIC (aggregated from daily runoff), the global mean Kling-Gupta efficiencies are 0.75 and 0.79 for the lumped and distributed schemes, respectively, with the distributed scheme better capturing spatial heterogeneity. Notably, the computation efficiency of the lumped scheme is 2 orders of magnitude higher than the distributed one and 7 orders more efficient than the VIC model. A case study of uncertainty analysis for the world's 16 basins with top annual streamflow is conducted using 100 000 model simulations, and it demonstrates the lumped scheme's extraordinary advantage in computational efficiency. Our results suggest that the revised lumped abcd model can serve as an efficient and reasonable HE for complex GHMs and is suitable for broad practical use, and the distributed scheme is also an efficient alternative if spatial heterogeneity is of more interest.

  9. Comparing Server Energy Use and Efficiency Using Small Sample Sizes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coles, Henry C.; Qin, Yong; Price, Phillip N.

    This report documents a demonstration that compared the energy consumption and efficiency of a limited sample size of server-type IT equipment from different manufacturers by measuring power at the server power supply power cords. The results are specific to the equipment and methods used. However, it is hoped that those responsible for IT equipment selection can used the methods described to choose models that optimize energy use efficiency. The demonstration was conducted in a data center at Lawrence Berkeley National Laboratory in Berkeley, California. It was performed with five servers of similar mechanical and electronic specifications; three from Intel andmore » one each from Dell and Supermicro. Server IT equipment is constructed using commodity components, server manufacturer-designed assemblies, and control systems. Server compute efficiency is constrained by the commodity component specifications and integration requirements. The design freedom, outside of the commodity component constraints, provides room for the manufacturer to offer a product with competitive efficiency that meets market needs at a compelling price. A goal of the demonstration was to compare and quantify the server efficiency for three different brands. The efficiency is defined as the average compute rate (computations per unit of time) divided by the average energy consumption rate. The research team used an industry standard benchmark software package to provide a repeatable software load to obtain the compute rate and provide a variety of power consumption levels. Energy use when the servers were in an idle state (not providing computing work) were also measured. At high server compute loads, all brands, using the same key components (processors and memory), had similar results; therefore, from these results, it could not be concluded that one brand is more efficient than the other brands. The test results show that the power consumption variability caused by the key components as a group is similar to all other components as a group. However, some differences were observed. The Supermicro server used 27 percent more power at idle compared to the other brands. The Intel server had a power supply control feature called cold redundancy, and the data suggest that cold redundancy can provide energy savings at low power levels. Test and evaluation methods that might be used by others having limited resources for IT equipment evaluation are explained in the report.« less

  10. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    PubMed

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Computational Study of the CC3 Impeller and Vaneless Diffuser Experiment

    NASA Technical Reports Server (NTRS)

    Kulkarni, Sameer; Beach, Timothy A.; Skoch, Gary J.

    2013-01-01

    Centrifugal compressors are compatible with the low exit corrected flows found in the high pressure compressor of turboshaft engines and may play an increasing role in turbofan engines as engine overall pressure ratios increase. Centrifugal compressor stages are difficult to model accurately with RANS CFD solvers. A computational study of the CC3 centrifugal impeller in its vaneless diffuser configuration was undertaken as part of an effort to understand potential causes of RANS CFD mis-prediction in these types of geometries. Three steady, periodic cases of the impeller and diffuser were modeled using the TURBO Parallel Version 4 code: 1) a k-epsilon turbulence model computation on a 6.8 million point grid using wall functions, 2) a k-epsilon turbulence model computation on a 14 million point grid integrating to the wall, and 3) a k-omega turbulence model computation on the 14 million point grid integrating to the wall. It was found that all three cases compared favorably to data from inlet to impeller trailing edge, but the k-epsilon and k-omega computations had disparate results beyond the trailing edge and into the vaneless diffuser. A large region of reversed flow was observed in the k-epsilon computations which extended from 70% to 100% span at the exit rating plane, whereas the k-omega computation had reversed flow from 95% to 100% span. Compared to experimental data at near-peak-efficiency, the reversed flow region in the k-epsilon case resulted in an under-prediction in adiabatic efficiency of 8.3 points, whereas the k-omega case was 1.2 points lower in efficiency.

  12. Computational Study of the CC3 Impeller and Vaneless Diffuser Experiment

    NASA Technical Reports Server (NTRS)

    Kulkarni, Sameer; Beach, Timothy A.; Skoch, Gary J.

    2013-01-01

    Centrifugal compressors are compatible with the low exit corrected flows found in the high pressure compressor of turboshaft engines and may play an increasing role in turbofan engines as engine overall pressure ratios increase. Centrifugal compressor stages are difficult to model accurately with RANS CFD solvers. A computational study of the CC3 centrifugal impeller in its vaneless diffuser configuration was undertaken as part of an effort to understand potential causes of RANS CFD mis-prediction in these types of geometries. Three steady, periodic cases of the impeller and diffuser were modeled using the TURBO Parallel Version 4 code: (1) a k-e turbulence model computation on a 6.8 million point grid using wall functions, (2) a k-e turbulence model computation on a 14 million point grid integrating to the wall, and (3) a k-? turbulence model computation on the 14 million point grid integrating to the wall. It was found that all three cases compared favorably to data from inlet to impeller trailing edge, but the k-e and k-? computations had disparate results beyond the trailing edge and into the vaneless diffuser. A large region of reversed flow was observed in the k-e computations which extended from 70 to 100 percent span at the exit rating plane, whereas the k-? computation had reversed flow from 95 to 100 percent span. Compared to experimental data at near-peak-efficiency, the reversed flow region in the k-e case resulted in an underprediction in adiabatic efficiency of 8.3 points, whereas the k-? case was 1.2 points lower in efficiency.

  13. Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

    DOE PAGES

    Grayver, Alexander V.; Kolev, Tzanio V.

    2015-11-01

    Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less

  14. Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grayver, Alexander V.; Kolev, Tzanio V.

    Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less

  15. Production experience with the ATLAS Event Service

    NASA Astrophysics Data System (ADS)

    Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration

    2017-10-01

    The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.

  16. Monitoring SLAC High Performance UNIX Computing Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lettsome, Annette K.; /Bethune-Cookman Coll. /SLAC

    2005-12-15

    Knowledge of the effectiveness and efficiency of computers is important when working with high performance systems. The monitoring of such systems is advantageous in order to foresee possible misfortunes or system failures. Ganglia is a software system designed for high performance computing systems to retrieve specific monitoring information. An alternative storage facility for Ganglia's collected data is needed since its default storage system, the round-robin database (RRD), struggles with data integrity. The creation of a script-driven MySQL database solves this dilemma. This paper describes the process took in the creation and implementation of the MySQL database for use by Ganglia.more » Comparisons between data storage by both databases are made using gnuplot and Ganglia's real-time graphical user interface.« less

  17. A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

    NASA Technical Reports Server (NTRS)

    Bailey, David H.; Ferguson, Helaman R. P.

    1988-01-01

    Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.

  18. Using the High-Level Based Program Interface to Facilitate the Large Scale Scientific Computing

    PubMed Central

    Shang, Yizi; Shang, Ling; Gao, Chuanchang; Lu, Guiming; Ye, Yuntao; Jia, Dongdong

    2014-01-01

    This paper is to make further research on facilitating the large-scale scientific computing on the grid and the desktop grid platform. The related issues include the programming method, the overhead of the high-level program interface based middleware, and the data anticipate migration. The block based Gauss Jordan algorithm as a real example of large-scale scientific computing is used to evaluate those issues presented above. The results show that the high-level based program interface makes the complex scientific applications on large-scale scientific platform easier, though a little overhead is unavoidable. Also, the data anticipation migration mechanism can improve the efficiency of the platform which needs to process big data based scientific applications. PMID:24574931

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krueger, Jens; Micikevicius, Paulius; Williams, Samuel

    Reverse Time Migration (RTM) is one of the main approaches in the seismic processing industry for imaging the subsurface structure of the Earth. While RTM provides qualitative advantages over its predecessors, it has a high computational cost warranting implementation on HPC architectures. We focus on three progressively more complex kernels extracted from RTM: for isotropic (ISO), vertical transverse isotropic (VTI) and tilted transverse isotropic (TTI) media. In this work, we examine performance optimization of forward wave modeling, which describes the computational kernels used in RTM, on emerging multi- and manycore processors and introduce a novel common subexpression elimination optimization formore » TTI kernels. We compare attained performance and energy efficiency in both the single-node and distributed memory environments in order to satisfy industry’s demands for fidelity, performance, and energy efficiency. Moreover, we discuss the interplay between architecture (chip and system) and optimizations (both on-node computation) highlighting the importance of NUMA-aware approaches to MPI communication. Ultimately, our results show we can improve CPU energy efficiency by more than 10× on Magny Cours nodes while acceleration via multiple GPUs can surpass the energy-efficient Intel Sandy Bridge by as much as 3.6×.« less

  20. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems

    PubMed Central

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel

    2017-01-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725

  1. Rapid solution of large-scale systems of equations

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1994-01-01

    The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.

  2. Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.

    PubMed

    Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard

    2015-02-01

    Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu

    Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less

  4. Towards Dynamic Remote Data Auditing in Computational Clouds

    PubMed Central

    Khurram Khan, Muhammad; Anuar, Nor Badrul

    2014-01-01

    Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server. PMID:25121114

  5. Computation of Nonlinear Hydrodynamic Loads on Floating Wind Turbines Using Fluid-Impulse Theory: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kok Yan Chan, G.; Sclavounos, P. D.; Jonkman, J.

    2015-04-02

    A hydrodynamics computer module was developed for the evaluation of the linear and nonlinear loads on floating wind turbines using a new fluid-impulse formulation for coupling with the FAST program. The recently developed formulation allows the computation of linear and nonlinear loads on floating bodies in the time domain and avoids the computationally intensive evaluation of temporal and nonlinear free-surface problems and efficient methods are derived for its computation. The body instantaneous wetted surface is approximated by a panel mesh and the discretization of the free surface is circumvented by using the Green function. The evaluation of the nonlinear loadsmore » is based on explicit expressions derived by the fluid-impulse theory, which can be computed efficiently. Computations are presented of the linear and nonlinear loads on the MIT/NREL tension-leg platform. Comparisons were carried out with frequency-domain linear and second-order methods. Emphasis was placed on modeling accuracy of the magnitude of nonlinear low- and high-frequency wave loads in a sea state. Although fluid-impulse theory is applied to floating wind turbines in this paper, the theory is applicable to other offshore platforms as well.« less

  6. Towards dynamic remote data auditing in computational clouds.

    PubMed

    Sookhak, Mehdi; Akhunzada, Adnan; Gani, Abdullah; Khurram Khan, Muhammad; Anuar, Nor Badrul

    2014-01-01

    Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.

  7. Computationally Efficient Adaptive Beamformer for Ultrasound Imaging Based on QR Decomposition.

    PubMed

    Park, Jongin; Wi, Seok-Min; Lee, Jin S

    2016-02-01

    Adaptive beamforming methods for ultrasound imaging have been studied to improve image resolution and contrast. The most common approach is the minimum variance (MV) beamformer which minimizes the power of the beamformed output while maintaining the response from the direction of interest constant. The method achieves higher resolution and better contrast than the delay-and-sum (DAS) beamformer, but it suffers from high computational cost. This cost is mainly due to the computation of the spatial covariance matrix and its inverse, which requires O(L(3)) computations, where L denotes the subarray size. In this study, we propose a computationally efficient MV beamformer based on QR decomposition. The idea behind our approach is to transform the spatial covariance matrix to be a scalar matrix σI and we subsequently obtain the apodization weights and the beamformed output without computing the matrix inverse. To do that, QR decomposition algorithm is used and also can be executed at low cost, and therefore, the computational complexity is reduced to O(L(2)). In addition, our approach is mathematically equivalent to the conventional MV beamformer, thereby showing the equivalent performances. The simulation and experimental results support the validity of our approach.

  8. Photon-trapping microstructures enable high-speed high-efficiency silicon photodiodes

    NASA Astrophysics Data System (ADS)

    Gao, Yang; Cansizoglu, Hilal; Polat, Kazim G.; Ghandiparsi, Soroush; Kaya, Ahmet; Mamtaz, Hasina H.; Mayet, Ahmed S.; Wang, Yinan; Zhang, Xinzhi; Yamada, Toshishige; Devine, Ekaterina Ponizovskaya; Elrefaie, Aly F.; Wang, Shih-Yuan; Islam, M. Saif

    2017-04-01

    High-speed, high-efficiency photodetectors play an important role in optical communication links that are increasingly being used in data centres to handle higher volumes of data traffic and higher bandwidths, as big data and cloud computing continue to grow exponentially. Monolithic integration of optical components with signal-processing electronics on a single silicon chip is of paramount importance in the drive to reduce cost and improve performance. We report the first demonstration of micro- and nanoscale holes enabling light trapping in a silicon photodiode, which exhibits an ultrafast impulse response (full-width at half-maximum) of 30 ps and a high efficiency of more than 50%, for use in data-centre optical communications. The photodiode uses micro- and nanostructured holes to enhance, by an order of magnitude, the absorption efficiency of a thin intrinsic layer of less than 2 µm thickness and is designed for a data rate of 20 gigabits per second or higher at a wavelength of 850 nm. Further optimization can improve the efficiency to more than 70%.

  9. Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach

    NASA Technical Reports Server (NTRS)

    Mak, Victor W. K.

    1986-01-01

    Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.

  10. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    DOE PAGES

    Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

    2017-09-21

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less

  11. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Desai, Ajit; Khalil, Mohammad; Pettit, Chris

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less

  12. Mantle convection on modern supercomputers

    NASA Astrophysics Data System (ADS)

    Weismüller, Jens; Gmeiner, Björn; Mohr, Marcus; Waluga, Christian; Wohlmuth, Barbara; Rüde, Ulrich; Bunge, Hans-Peter

    2015-04-01

    Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures demand an interdisciplinary co-design. Here we report about recent advances of the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups in computer sciences, mathematics and geophysical application under the leadership of FAU Erlangen. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection assessing the impact of small scale processes on global mantle flow.

  13. Addressing Curse of Dimensionality in Sensitivity Analysis: How Can We Handle High-Dimensional Problems?

    NASA Astrophysics Data System (ADS)

    Safaei, S.; Haghnegahdar, A.; Razavi, S.

    2016-12-01

    Complex environmental models are now the primary tool to inform decision makers for the current or future management of environmental resources under the climate and environmental changes. These complex models often contain a large number of parameters that need to be determined by a computationally intensive calibration procedure. Sensitivity analysis (SA) is a very useful tool that not only allows for understanding the model behavior, but also helps in reducing the number of calibration parameters by identifying unimportant ones. The issue is that most global sensitivity techniques are highly computationally demanding themselves for generating robust and stable sensitivity metrics over the entire model response surface. Recently, a novel global sensitivity analysis method, Variogram Analysis of Response Surfaces (VARS), is introduced that can efficiently provide a comprehensive assessment of global sensitivity using the Variogram concept. In this work, we aim to evaluate the effectiveness of this highly efficient GSA method in saving computational burden, when applied to systems with extra-large number of input factors ( 100). We use a test function and a hydrological modelling case study to demonstrate the capability of VARS method in reducing problem dimensionality by identifying important vs unimportant input factors.

  14. An Analysis of Performance Enhancement Techniques for Overset Grid Applications

    NASA Technical Reports Server (NTRS)

    Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.

  15. Tempest - Efficient Computation of Atmospheric Flows Using High-Order Local Discretization Methods

    NASA Astrophysics Data System (ADS)

    Ullrich, P. A.; Guerra, J. E.

    2014-12-01

    The Tempest Framework composes several compact numerical methods to easily facilitate intercomparison of atmospheric flow calculations on the sphere and in rectangular domains. This framework includes the implementations of Spectral Elements, Discontinuous Galerkin, Flux Reconstruction, and Hybrid Finite Element methods with the goal of achieving optimal accuracy in the solution of atmospheric problems. Several advantages of this approach are discussed such as: improved pressure gradient calculation, numerical stability by vertical/horizontal splitting, arbitrary order of accuracy, etc. The local numerical discretization allows for high performance parallel computation and efficient inclusion of parameterizations. These techniques are used in conjunction with a non-conformal, locally refined, cubed-sphere grid for global simulations and standard Cartesian grids for simulations at the mesoscale. A complete implementation of the methods described is demonstrated in a non-hydrostatic setting.

  16. DNS of Low-Pressure Turbine Cascade Flows with Elevated Inflow Turbulence Using a Discontinuous-Galerkin Spectral-Element Method

    NASA Technical Reports Server (NTRS)

    Garai, Anirban; Diosady, Laslo T.; Murman, Scott M.; Madavan, Nateri K.

    2016-01-01

    Recent progress towards developing a new computational capability for accurate and efficient high-fidelity direct numerical simulation (DNS) and large-eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy- stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy, and is implemented in a computationally efficient manner on a modern high performance computer architecture. An inflow turbulence generation procedure based on a linear forcing approach has been incorporated in this framework and DNS conducted to study the effect of inflow turbulence on the suction- side separation bubble in low-pressure turbine (LPT) cascades. The T106 series of airfoil cascades in both lightly (T106A) and highly loaded (T106C) configurations at exit isentropic Reynolds numbers of 60,000 and 80,000, respectively, are considered. The numerical simulations are performed using 8th-order accurate spatial and 4th-order accurate temporal discretization. The changes in separation bubble topology due to elevated inflow turbulence is captured by the present method and the physical mechanisms leading to the changes are explained. The present results are in good agreement with prior numerical simulations but some expected discrepancies with the experimental data for the T106C case are noted and discussed.

  17. An Accurate and Computationally Efficient Model for Membrane-Type Circular-Symmetric Micro-Hotplates

    PubMed Central

    Khan, Usman; Falconi, Christian

    2014-01-01

    Ideally, the design of high-performance micro-hotplates would require a large number of simulations because of the existence of many important design parameters as well as the possibly crucial effects of both spread and drift. However, the computational cost of FEM simulations, which are the only available tool for accurately predicting the temperature in micro-hotplates, is very high. As a result, micro-hotplate designers generally have no effective simulation-tools for the optimization. In order to circumvent these issues, here, we propose a model for practical circular-symmetric micro-hot-plates which takes advantage of modified Bessel functions, computationally efficient matrix-approach for considering the relevant boundary conditions, Taylor linearization for modeling the Joule heating and radiation losses, and external-region-segmentation strategy in order to accurately take into account radiation losses in the entire micro-hotplate. The proposed model is almost as accurate as FEM simulations and two to three orders of magnitude more computationally efficient (e.g., 45 s versus more than 8 h). The residual errors, which are mainly associated to the undesired heating in the electrical contacts, are small (e.g., few degrees Celsius for an 800 °C operating temperature) and, for important analyses, almost constant. Therefore, we also introduce a computationally-easy single-FEM-compensation strategy in order to reduce the residual errors to about 1 °C. As illustrative examples of the power of our approach, we report the systematic investigation of a spread in the membrane thermal conductivity and of combined variations of both ambient and bulk temperatures. Our model enables a much faster characterization of micro-hotplates and, thus, a much more effective optimization prior to fabrication. PMID:24763214

  18. Livermore Big Artificial Neural Network Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Essen, Brian Van; Jacobs, Sam; Kim, Hyojin

    2016-07-01

    LBANN is a toolkit that is designed to train artificial neural networks efficiently on high performance computing architectures. It is optimized to take advantages of key High Performance Computing features to accelerate neural network training. Specifically it is optimized for low-latency, high bandwidth interconnects, node-local NVRAM, node-local GPU accelerators, and high bandwidth parallel file systems. It is built on top of the open source Elemental distributed-memory dense and spars-direct linear algebra and optimization library that is released under the BSD license. The algorithms contained within LBANN are drawn from the academic literature and implemented to work within a distributed-memory framework.

  19. A comparison of select image-compression algorithms for an electronic still camera

    NASA Technical Reports Server (NTRS)

    Nerheim, Rosalee

    1989-01-01

    This effort is a study of image-compression algorithms for an electronic still camera. An electronic still camera can record and transmit high-quality images without the use of film, because images are stored digitally in computer memory. However, high-resolution images contain an enormous amount of information, and will strain the camera's data-storage system. Image compression will allow more images to be stored in the camera's memory. For the electronic still camera, a compression algorithm that produces a reconstructed image of high fidelity is most important. Efficiency of the algorithm is the second priority. High fidelity and efficiency are more important than a high compression ratio. Several algorithms were chosen for this study and judged on fidelity, efficiency and compression ratio. The transform method appears to be the best choice. At present, the method is compressing images to a ratio of 5.3:1 and producing high-fidelity reconstructed images.

  20. Belle II grid computing: An overview of the distributed data management system.

    NASA Astrophysics Data System (ADS)

    Bansal, Vikas; Schram, Malachi; Belle Collaboration, II

    2017-01-01

    The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.

  1. Three-Dimensional Navier-Stokes Method with Two-Equation Turbulence Models for Efficient Numerical Simulation of Hypersonic Flows

    NASA Technical Reports Server (NTRS)

    Bardina, J. E.

    1994-01-01

    A new computational efficient 3-D compressible Reynolds-averaged implicit Navier-Stokes method with advanced two equation turbulence models for high speed flows is presented. All convective terms are modeled using an entropy satisfying higher-order Total Variation Diminishing (TVD) scheme based on implicit upwind flux-difference split approximations and arithmetic averaging procedure of primitive variables. This method combines the best features of data management and computational efficiency of space marching procedures with the generality and stability of time dependent Navier-Stokes procedures to solve flows with mixed supersonic and subsonic zones, including streamwise separated flows. Its robust stability derives from a combination of conservative implicit upwind flux-difference splitting with Roe's property U to provide accurate shock capturing capability that non-conservative schemes do not guarantee, alternating symmetric Gauss-Seidel 'method of planes' relaxation procedure coupled with a three-dimensional two-factor diagonal-dominant approximate factorization scheme, TVD flux limiters of higher-order flux differences satisfying realizability, and well-posed characteristic-based implicit boundary-point a'pproximations consistent with the local characteristics domain of dependence. The efficiency of the method is highly increased with Newton Raphson acceleration which allows convergence in essentially one forward sweep for supersonic flows. The method is verified by comparing with experiment and other Navier-Stokes methods. Here, results of adiabatic and cooled flat plate flows, compression corner flow, and 3-D hypersonic shock-wave/turbulent boundary layer interaction flows are presented. The robust 3-D method achieves a better computational efficiency of at least one order of magnitude over the CNS Navier-Stokes code. It provides cost-effective aerodynamic predictions in agreement with experiment, and the capability of predicting complex flow structures in complex geometries with good accuracy.

  2. PIMS: Memristor-Based Processing-in-Memory-and-Storage.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cook, Jeanine

    Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore » efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.« less

  3. Reduced basis ANOVA methods for partial differential equations with high-dimensional random inputs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, Qifeng, E-mail: liaoqf@shanghaitech.edu.cn; Lin, Guang, E-mail: guanglin@purdue.edu

    2016-07-15

    In this paper we present a reduced basis ANOVA approach for partial deferential equations (PDEs) with random inputs. The ANOVA method combined with stochastic collocation methods provides model reduction in high-dimensional parameter space through decomposing high-dimensional inputs into unions of low-dimensional inputs. In this work, to further reduce the computational cost, we investigate spatial low-rank structures in the ANOVA-collocation method, and develop efficient spatial model reduction techniques using hierarchically generated reduced bases. We present a general mathematical framework of the methodology, validate its accuracy and demonstrate its efficiency with numerical experiments.

  4. On some methods for improving time of reachability sets computation for the dynamic system control problem

    NASA Astrophysics Data System (ADS)

    Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir

    2016-12-01

    The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.

  5. High-performance noncontact thermal diode via asymmetric nanostructures

    NASA Astrophysics Data System (ADS)

    Shen, Jiadong; Liu, Xianglei; He, Huan; Wu, Weitao; Liu, Baoan

    2018-05-01

    Electric diodes, though laying the foundation of modern electronics and information processing industries, suffer from ineffectiveness and even failure at high temperatures. Thermal diodes are promising alternatives to relieve above limitations, but usually possess low rectification ratios, and how to obtain a high-performance thermal rectification effect is still an open question. This paper proposes an efficient contactless thermal diode based on the near-field thermal radiation of asymmetric doped silicon nanostructures. The rectification ratio computed via exact scattering theories is demonstrated to be as high as 10 at a nanoscale gap distance and period, outperforming the counterpart flat-plate diode by more than one order of magnitude. This extraordinary performance mainly lies in the higher forward and lower reverse radiative heat flux within the low frequency band compared with the counterpart flat-plate diode, which is caused by a lower loss and smaller cut-off wavevector of nanostructures for the forward and reversed scheme, respectively. This work opens new routes to realize high performance thermal diodes, and may have wide applications in efficient thermal computing, thermal information processing, and thermal management.

  6. Analytical Cost Metrics : Days of Future Past

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prajapati, Nirmal; Rajopadhye, Sanjay; Djidjev, Hristo Nikolov

    As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore’s law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore the major challenge that we face in computing systems researchmore » is: “how to solve massive-scale computational problems in the most time/power/energy efficient manner?”« less

  7. Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

    2014-09-01

    Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.

  8. Time-Accurate Local Time Stepping and High-Order Time CESE Methods for Multi-Dimensional Flows Using Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Chang, Chau-Lyan; Venkatachari, Balaji Shankar; Cheng, Gary

    2013-01-01

    With the wide availability of affordable multiple-core parallel supercomputers, next generation numerical simulations of flow physics are being focused on unsteady computations for problems involving multiple time scales and multiple physics. These simulations require higher solution accuracy than most algorithms and computational fluid dynamics codes currently available. This paper focuses on the developmental effort for high-fidelity multi-dimensional, unstructured-mesh flow solvers using the space-time conservation element, solution element (CESE) framework. Two approaches have been investigated in this research in order to provide high-accuracy, cross-cutting numerical simulations for a variety of flow regimes: 1) time-accurate local time stepping and 2) highorder CESE method. The first approach utilizes consistent numerical formulations in the space-time flux integration to preserve temporal conservation across the cells with different marching time steps. Such approach relieves the stringent time step constraint associated with the smallest time step in the computational domain while preserving temporal accuracy for all the cells. For flows involving multiple scales, both numerical accuracy and efficiency can be significantly enhanced. The second approach extends the current CESE solver to higher-order accuracy. Unlike other existing explicit high-order methods for unstructured meshes, the CESE framework maintains a CFL condition of one for arbitrarily high-order formulations while retaining the same compact stencil as its second-order counterpart. For large-scale unsteady computations, this feature substantially enhances numerical efficiency. Numerical formulations and validations using benchmark problems are discussed in this paper along with realistic examples.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pastore, Giovanni; Rabiti, Cristian; Pizzocri, Davide

    PolyPole is a numerical algorithm for the calculation of intra-granular fission gas release. In particular, the algorithm solves the gas diffusion problem in a fuel grain in time-varying conditions. The program has been extensively tested. PolyPole combines a high accuracy with a high computational efficiency and is ideally suited for application in fuel performance codes.

  10. Computational Issues in Damping Identification for Large Scale Problems

    NASA Technical Reports Server (NTRS)

    Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.

    1997-01-01

    Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.

  11. Fault Injection Campaign for a Fault Tolerant Duplex Framework

    NASA Technical Reports Server (NTRS)

    Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

    2007-01-01

    Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.

  12. TRIM—3D: a three-dimensional model for accurate simulation of shallow water flow

    USGS Publications Warehouse

    Casulli, Vincenzo; Bertolazzi, Enrico; Cheng, Ralph T.

    1993-01-01

    A semi-implicit finite difference formulation for the numerical solution of three-dimensional tidal circulation is discussed. The governing equations are the three-dimensional Reynolds equations in which the pressure is assumed to be hydrostatic. A minimal degree of implicitness has been introduced in the finite difference formula so that the resulting algorithm permits the use of large time steps at a minimal computational cost. This formulation includes the simulation of flooding and drying of tidal flats, and is fully vectorizable for an efficient implementation on modern vector computers. The high computational efficiency of this method has made it possible to provide the fine details of circulation structure in complex regions that previous studies were unable to obtain. For proper interpretation of the model results suitable interactive graphics is also an essential tool.

  13. Computational Analysis of Nanoparticles-Molten Salt Thermal Energy Storage for Concentrated Solar Power Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, Vinod

    2017-05-05

    High fidelity computational models of thermocline-based thermal energy storage (TES) were developed. The research goal was to advance the understanding of a single tank nanofludized molten salt based thermocline TES system under various concentration and sizes of the particles suspension. Our objectives were to utilize sensible-heat that operates with least irreversibility by using nanoscale physics. This was achieved by performing computational analysis of several storage designs, analyzing storage efficiency and estimating cost effectiveness for the TES systems under a concentrating solar power (CSP) scheme using molten salt as the storage medium. Since TES is one of the most costly butmore » important components of a CSP plant, an efficient TES system has potential to make the electricity generated from solar technologies cost competitive with conventional sources of electricity.« less

  14. A 250-Mbit/s ring local computer network using 1.3-microns single-mode optical fibers

    NASA Technical Reports Server (NTRS)

    Eng, S. T.; Tell, R.; Andersson, T.; Eng, B.

    1985-01-01

    A 250-Mbit/s three-station fiber-optic ring local computer network was built and successfully demonstrated. A conventional token protocol was employed for bus arbitration to maximize the bus efficiency under high loading conditions, and a non-return-to-zero (NRS) data encoding format was selected for simplicity and maximum utilization of the ECL-circuit bandwidth.

  15. Portable parallel stochastic optimization for the design of aeropropulsion components

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Rhodes, G. S.

    1994-01-01

    This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.

  16. Towards a flexible middleware for context-aware pervasive and wearable systems.

    PubMed

    Muro, Marco; Amoretti, Michele; Zanichelli, Francesco; Conte, Gianni

    2012-11-01

    Ambient intelligence and wearable computing call for innovative hardware and software technologies, including a highly capable, flexible and efficient middleware, allowing for the reuse of existing pervasive applications when developing new ones. In the considered application domain, middleware should also support self-management, interoperability among different platforms, efficient communications, and context awareness. In the on-going "everything is networked" scenario scalability appears as a very important issue, for which the peer-to-peer (P2P) paradigm emerges as an appealing solution for connecting software components in an overlay network, allowing for efficient and balanced data distribution mechanisms. In this paper, we illustrate how all these concepts can be placed into a theoretical tool, called networked autonomic machine (NAM), implemented into a NAM-based middleware, and evaluated against practical problems of pervasive computing.

  17. High efficiency coherent optical memory with warm rubidium vapour

    PubMed Central

    Hosseini, M.; Sparkes, B.M.; Campbell, G.; Lam, P.K.; Buchler, B.C.

    2011-01-01

    By harnessing aspects of quantum mechanics, communication and information processing could be radically transformed. Promising forms of quantum information technology include optical quantum cryptographic systems and computing using photons for quantum logic operations. As with current information processing systems, some form of memory will be required. Quantum repeaters, which are required for long distance quantum key distribution, require quantum optical memory as do deterministic logic gates for optical quantum computing. Here, we present results from a coherent optical memory based on warm rubidium vapour and show 87% efficient recall of light pulses, the highest efficiency measured to date for any coherent optical memory suitable for quantum information applications. We also show storage and recall of up to 20 pulses from our system. These results show that simple warm atomic vapour systems have clear potential as a platform for quantum memory. PMID:21285952

  18. High efficiency coherent optical memory with warm rubidium vapour.

    PubMed

    Hosseini, M; Sparkes, B M; Campbell, G; Lam, P K; Buchler, B C

    2011-02-01

    By harnessing aspects of quantum mechanics, communication and information processing could be radically transformed. Promising forms of quantum information technology include optical quantum cryptographic systems and computing using photons for quantum logic operations. As with current information processing systems, some form of memory will be required. Quantum repeaters, which are required for long distance quantum key distribution, require quantum optical memory as do deterministic logic gates for optical quantum computing. Here, we present results from a coherent optical memory based on warm rubidium vapour and show 87% efficient recall of light pulses, the highest efficiency measured to date for any coherent optical memory suitable for quantum information applications. We also show storage and recall of up to 20 pulses from our system. These results show that simple warm atomic vapour systems have clear potential as a platform for quantum memory.

  19. On a fast calculation of structure factors at a subatomic resolution.

    PubMed

    Afonine, P V; Urzhumtsev, A

    2004-01-01

    In the last decade, the progress of protein crystallography allowed several protein structures to be solved at a resolution higher than 0.9 A. Such studies provide researchers with important new information reflecting very fine structural details. The signal from these details is very weak with respect to that corresponding to the whole structure. Its analysis requires high-quality data, which previously were available only for crystals of small molecules, and a high accuracy of calculations. The calculation of structure factors using direct formulae, traditional for 'small-molecule' crystallography, allows a relatively simple accuracy control. For macromolecular crystals, diffraction data sets at a subatomic resolution contain hundreds of thousands of reflections, and the number of parameters used to describe the corresponding models may reach the same order. Therefore, the direct way of calculating structure factors becomes very time expensive when applied to large molecules. These problems of high accuracy and computational efficiency require a re-examination of computer tools and algorithms. The calculation of model structure factors through an intermediate generation of an electron density [Sayre (1951). Acta Cryst. 4, 362-367; Ten Eyck (1977). Acta Cryst. A33, 486-492] may be much more computationally efficient, but contains some parameters (grid step, 'effective' atom radii etc.) whose influence on the accuracy of the calculation is not straightforward. At the same time, the choice of parameters within safety margins that largely ensure a sufficient accuracy may result in a significant loss of the CPU time, making it close to the time for the direct-formulae calculations. The impact of the different parameters on the computer efficiency of structure-factor calculation is studied. It is shown that an appropriate choice of these parameters allows the structure factors to be obtained with a high accuracy and in a significantly shorter time than that required when using the direct formulae. Practical algorithms for the optimal choice of the parameters are suggested.

  20. Photonic Design: From Fundamental Solar Cell Physics to Computational Inverse Design

    NASA Astrophysics Data System (ADS)

    Miller, Owen Dennis

    Photonic innovation is becoming ever more important in the modern world. Optical systems are dominating shorter and shorter communications distances, LED's are rapidly emerging for a variety of applications, and solar cells show potential to be a mainstream technology in the energy space. The need for novel, energy-efficient photonic and optoelectronic devices will only increase. This work unites fundamental physics and a novel computational inverse design approach towards such innovation. The first half of the dissertation is devoted to the physics of high-efficiency solar cells. As solar cells approach fundamental efficiency limits, their internal physics transforms. Photonic considerations, instead of electronic ones, are the key to reaching the highest voltages and efficiencies. Proper photon management led to Alta Device's recent dramatic increase of the solar cell efficiency record to 28.3%. Moreover, approaching the Shockley-Queisser limit for any solar cell technology will require light extraction to become a part of all future designs. The second half of the dissertation introduces inverse design as a new computational paradigm in photonics. An assortment of techniques (FDTD, FEM, etc.) have enabled quick and accurate simulation of the "forward problem" of finding fields for a given geometry. However, scientists and engineers are typically more interested in the inverse problem: for a desired functionality, what geometry is needed? Answering this question breaks from the emphasis on the forward problem and forges a new path in computational photonics. The framework of shape calculus enables one to quickly find superior, non-intuitive designs. Novel designs for optical cloaking and sub-wavelength solar cell applications are presented.

  1. Heterogeneous Distributed Computing for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Sunderam, Vaidy S.

    1998-01-01

    The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.

  2. HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

    DOE PAGES

    Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...

    2017-09-29

    Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less

  3. HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian

    Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less

  4. A strategy for improved computational efficiency of the method of anchored distributions

    NASA Astrophysics Data System (ADS)

    Over, Matthew William; Yang, Yarong; Chen, Xingyuan; Rubin, Yoram

    2013-06-01

    This paper proposes a strategy for improving the computational efficiency of model inversion using the method of anchored distributions (MAD) by "bundling" similar model parametrizations in the likelihood function. Inferring the likelihood function typically requires a large number of forward model (FM) simulations for each possible model parametrization; as a result, the process is quite expensive. To ease this prohibitive cost, we present an approximation for the likelihood function called bundling that relaxes the requirement for high quantities of FM simulations. This approximation redefines the conditional statement of the likelihood function as the probability of a set of similar model parametrizations "bundle" replicating field measurements, which we show is neither a model reduction nor a sampling approach to improving the computational efficiency of model inversion. To evaluate the effectiveness of these modifications, we compare the quality of predictions and computational cost of bundling relative to a baseline MAD inversion of 3-D flow and transport model parameters. Additionally, to aid understanding of the implementation we provide a tutorial for bundling in the form of a sample data set and script for the R statistical computing language. For our synthetic experiment, bundling achieved a 35% reduction in overall computational cost and had a limited negative impact on predicted probability distributions of the model parameters. Strategies for minimizing error in the bundling approximation, for enforcing similarity among the sets of model parametrizations, and for identifying convergence of the likelihood function are also presented.

  5. Data-driven Applications for the Sun-Earth System

    NASA Astrophysics Data System (ADS)

    Kondrashov, D. A.

    2016-12-01

    Advances in observational and data mining techniques allow extracting information from the large volume of Sun-Earth observational data that can be assimilated into first principles physical models. However, equations governing Sun-Earth phenomena are typically nonlinear, complex, and high-dimensional. The high computational demand of solving the full governing equations over a large range of scales precludes the use of a variety of useful assimilative tools that rely on applied mathematical and statistical techniques for quantifying uncertainty and predictability. Effective use of such tools requires the development of computationally efficient methods to facilitate fusion of data with models. This presentation will provide an overview of various existing as well as newly developed data-driven techniques adopted from atmospheric and oceanic sciences that proved to be useful for space physics applications, such as computationally efficient implementation of Kalman Filter in radiation belts modeling, solar wind gap-filling by Singular Spectrum Analysis, and low-rank procedure for assimilation of low-altitude ionospheric magnetic perturbations into the Lyon-Fedder-Mobarry (LFM) global magnetospheric model. Reduced-order non-Markovian inverse modeling and novel data-adaptive decompositions of Sun-Earth datasets will be also demonstrated.

  6. An efficient and accurate framework for calculating lattice thermal conductivity of solids: AFLOW—AAPL Automatic Anharmonic Phonon Library

    NASA Astrophysics Data System (ADS)

    Plata, Jose J.; Nath, Pinku; Usanmaz, Demet; Carrete, Jesús; Toher, Cormac; de Jong, Maarten; Asta, Mark; Fornari, Marco; Nardelli, Marco Buongiorno; Curtarolo, Stefano

    2017-10-01

    One of the most accurate approaches for calculating lattice thermal conductivity, , is solving the Boltzmann transport equation starting from third-order anharmonic force constants. In addition to the underlying approximations of ab-initio parameterization, two main challenges are associated with this path: high computational costs and lack of automation in the frameworks using this methodology, which affect the discovery rate of novel materials with ad-hoc properties. Here, the Automatic Anharmonic Phonon Library (AAPL) is presented. It efficiently computes interatomic force constants by making effective use of crystal symmetry analysis, it solves the Boltzmann transport equation to obtain , and allows a fully integrated operation with minimum user intervention, a rational addition to the current high-throughput accelerated materials development framework AFLOW. An "experiment vs. theory" study of the approach is shown, comparing accuracy and speed with respect to other available packages, and for materials characterized by strong electron localization and correlation. Combining AAPL with the pseudo-hybrid functional ACBN0 is possible to improve accuracy without increasing computational requirements.

  7. Modeling of the WSTF frictional heating apparatus in high pressure systems

    NASA Technical Reports Server (NTRS)

    Skowlund, Christopher T.

    1992-01-01

    In order to develop a computer program able to model the frictional heating of metals in high pressure oxygen or nitrogen a number of additions have been made to the frictional heating model originally developed for tests in low pressure helium. These additions include: (1) a physical property package for the gases to account for departures from the ideal gas state; (2) two methods for spatial discretization (finite differences with quadratic interpolation or orthogonal collocation on finite elements) which substantially reduce the computer time required to solve the transient heat balance; (3) more efficient programs for the integration of the ordinary differential equations resulting from the discretization of the partial differential equations; and (4) two methods for determining the best-fit parameters via minimization of the mean square error (either a direct search multivariable simplex method or a modified Levenburg-Marquardt algorithm). The resulting computer program has been shown to be accurate, efficient and robust for determining the heat flux or friction coefficient vs. time at the interface of the stationary and rotating samples.

  8. Two-Dimensional Electronic Spectroscopy of Benzene, Phenol, and Their Dimer: An Efficient First-Principles Simulation Protocol.

    PubMed

    Nenov, Artur; Mukamel, Shaul; Garavelli, Marco; Rivalta, Ivan

    2015-08-11

    First-principles simulations of two-dimensional electronic spectroscopy in the ultraviolet region (2DUV) require computationally demanding multiconfigurational approaches that can resolve doubly excited and charge transfer states, the spectroscopic fingerprints of coupled UV-active chromophores. Here, we propose an efficient approach to reduce the computational cost of accurate simulations of 2DUV spectra of benzene, phenol, and their dimer (i.e., the minimal models for studying electronic coupling of UV-chromophores in proteins). We first establish the multiconfigurational recipe with the highest accuracy by comparison with experimental data, providing reference gas-phase transition energies and dipole moments that can be used to construct exciton Hamiltonians involving high-lying excited states. We show that by reducing the active spaces and the number of configuration state functions within restricted active space schemes, the computational cost can be significantly decreased without loss of accuracy in predicting 2DUV spectra. The proposed recipe has been successfully tested on a realistic model proteic system in water. Accounting for line broadening due to thermal and solvent-induced fluctuations allows for direct comparison with experiments.

  9. Fragment informatics and computational fragment-based drug design: an overview and update.

    PubMed

    Sheng, Chunquan; Zhang, Wannian

    2013-05-01

    Fragment-based drug design (FBDD) is a promising approach for the discovery and optimization of lead compounds. Despite its successes, FBDD also faces some internal limitations and challenges. FBDD requires a high quality of target protein and good solubility of fragments. Biophysical techniques for fragment screening necessitate expensive detection equipment and the strategies for evolving fragment hits to leads remain to be improved. Regardless, FBDD is necessary for investigating larger chemical space and can be applied to challenging biological targets. In this scenario, cheminformatics and computational chemistry can be used as alternative approaches that can significantly improve the efficiency and success rate of lead discovery and optimization. Cheminformatics and computational tools assist FBDD in a very flexible manner. Computational FBDD can be used independently or in parallel with experimental FBDD for efficiently generating and optimizing leads. Computational FBDD can also be integrated into each step of experimental FBDD and help to play a synergistic role by maximizing its performance. This review will provide critical analysis of the complementarity between computational and experimental FBDD and highlight recent advances in new algorithms and successful examples of their applications. In particular, fragment-based cheminformatics tools, high-throughput fragment docking, and fragment-based de novo drug design will provide the focus of this review. We will also discuss the advantages and limitations of different methods and the trends in new developments that should inspire future research. © 2012 Wiley Periodicals, Inc.

  10. Efficient Computation of Closed-loop Frequency Response for Large Order Flexible Systems

    NASA Technical Reports Server (NTRS)

    Maghami, Peiman G.; Giesy, Daniel P.

    1997-01-01

    An efficient and robust computational scheme is given for the calculation of the frequency response function of a large order, flexible system implemented with a linear, time invariant control system. Advantage is taken of the highly structured sparsity of the system matrix of the plant based on a model of the structure using normal mode coordinates. The computational time per frequency point of the new computational scheme is a linear function of system size, a significant improvement over traditional, full-matrix techniques whose computational times per frequency point range from quadratic to cubic functions of system size. This permits the practical frequency domain analysis of systems of much larger order than by traditional, full-matrix techniques. Formulations are given for both open and closed loop loop systems. Numerical examples are presented showing the advantages of the present formulation over traditional approaches, both in speed and in accuracy. Using a model with 703 structural modes, a speed-up of almost two orders of magnitude was observed while accuracy improved by up to 5 decimal places.

  11. Modular thermal analyzer routine, volume 1

    NASA Technical Reports Server (NTRS)

    Oren, J. A.; Phillips, M. A.; Williams, D. R.

    1972-01-01

    The Modular Thermal Analyzer Routine (MOTAR) is a general thermal analysis routine with strong capabilities for performing thermal analysis of systems containing flowing fluids, fluid system controls (valves, heat exchangers, etc.), life support systems, and thermal radiation situations. Its modular organization permits the analysis of a very wide range of thermal problems for simple problems containing a few conduction nodes to those containing complicated flow and radiation analysis with each problem type being analyzed with peak computational efficiency and maximum ease of use. The organization and programming methods applied to MOTAR achieved a high degree of computer utilization efficiency in terms of computer execution time and storage space required for a given problem. The computer time required to perform a given problem on MOTAR is approximately 40 to 50 percent that required for the currently existing widely used routines. The computer storage requirement for MOTAR is approximately 25 percent more than the most commonly used routines for the most simple problems but the data storage techniques for the more complicated options should save a considerable amount of space.

  12. Estimation of Radiative Efficiency of Chemicals with Potentially Significant Global Warming Potential.

    PubMed

    Betowski, Don; Bevington, Charles; Allison, Thomas C

    2016-01-19

    Halogenated chemical substances are used in a broad array of applications, and new chemical substances are continually being developed and introduced into commerce. While recent research has considerably increased our understanding of the global warming potentials (GWPs) of multiple individual chemical substances, this research inevitably lags behind the development of new chemical substances. There are currently over 200 substances known to have high GWP. Evaluation of schemes to estimate radiative efficiency (RE) based on computational chemistry are useful where no measured IR spectrum is available. This study assesses the reliability of values of RE calculated using computational chemistry techniques for 235 chemical substances against the best available values. Computed vibrational frequency data is used to estimate RE values using several Pinnock-type models, and reasonable agreement with reported values is found. Significant improvement is obtained through scaling of both vibrational frequencies and intensities. The effect of varying the computational method and basis set used to calculate the frequency data is discussed. It is found that the vibrational intensities have a strong dependence on basis set and are largely responsible for differences in computed RE values.

  13. Memristor-Based Analog Computation and Neural Network Classification with a Dot Product Engine.

    PubMed

    Hu, Miao; Graves, Catherine E; Li, Can; Li, Yunning; Ge, Ning; Montgomery, Eric; Davila, Noraica; Jiang, Hao; Williams, R Stanley; Yang, J Joshua; Xia, Qiangfei; Strachan, John Paul

    2018-03-01

    Using memristor crossbar arrays to accelerate computations is a promising approach to efficiently implement algorithms in deep neural networks. Early demonstrations, however, are limited to simulations or small-scale problems primarily due to materials and device challenges that limit the size of the memristor crossbar arrays that can be reliably programmed to stable and analog values, which is the focus of the current work. High-precision analog tuning and control of memristor cells across a 128 × 64 array is demonstrated, and the resulting vector matrix multiplication (VMM) computing precision is evaluated. Single-layer neural network inference is performed in these arrays, and the performance compared to a digital approach is assessed. Memristor computing system used here reaches a VMM accuracy equivalent of 6 bits, and an 89.9% recognition accuracy is achieved for the 10k MNIST handwritten digit test set. Forecasts show that with integrated (on chip) and scaled memristors, a computational efficiency greater than 100 trillion operations per second per Watt is possible. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Practical experimental certification of computational quantum gates using a twirling procedure.

    PubMed

    Moussa, Osama; da Silva, Marcus P; Ryan, Colm A; Laflamme, Raymond

    2012-08-17

    Because of the technical difficulty of building large quantum computers, it is important to be able to estimate how faithful a given implementation is to an ideal quantum computer. The common approach of completely characterizing the computation process via quantum process tomography requires an exponential amount of resources, and thus is not practical even for relatively small devices. We solve this problem by demonstrating that twirling experiments previously used to characterize the average fidelity of quantum memories efficiently can be easily adapted to estimate the average fidelity of the experimental implementation of important quantum computation processes, such as unitaries in the Clifford group, in a practical and efficient manner with applicability in current quantum devices. Using this procedure, we demonstrate state-of-the-art coherent control of an ensemble of magnetic moments of nuclear spins in a single crystal solid by implementing the encoding operation for a 3-qubit code with only a 1% degradation in average fidelity discounting preparation and measurement errors. We also highlight one of the advances that was instrumental in achieving such high fidelity control.

  15. Predicting Flows of Rarefied Gases

    NASA Technical Reports Server (NTRS)

    LeBeau, Gerald J.; Wilmoth, Richard G.

    2005-01-01

    DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.

  16. Efficient Use of Distributed Systems for Scientific Applications

    NASA Technical Reports Server (NTRS)

    Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques

    2000-01-01

    Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring distributed systems. In particular this application, illustrated in the document entails an integration of finite element and fluid dynamic simulations to address the cooling of turbine blades of a gas turbine engine design. It is not uncommon to encounter high-temperature, film-cooled turbine airfoils with 1,000,000s of degrees of freedom. This results because of the complexity of the various components of the airfoils, requiring fine-grain meshing for accuracy. Additional information is contained in the original.

  17. A fast mass spring model solver for high-resolution elastic objects

    NASA Astrophysics Data System (ADS)

    Zheng, Mianlun; Yuan, Zhiyong; Zhu, Weixu; Zhang, Guian

    2017-03-01

    Real-time simulation of elastic objects is of great importance for computer graphics and virtual reality applications. The fast mass spring model solver can achieve visually realistic simulation in an efficient way. Unfortunately, this method suffers from resolution limitations and lack of mechanical realism for a surface geometry model, which greatly restricts its application. To tackle these problems, in this paper we propose a fast mass spring model solver for high-resolution elastic objects. First, we project the complex surface geometry model into a set of uniform grid cells as cages through *cages mean value coordinate method to reflect its internal structure and mechanics properties. Then, we replace the original Cholesky decomposition method in the fast mass spring model solver with a conjugate gradient method, which can make the fast mass spring model solver more efficient for detailed surface geometry models. Finally, we propose a graphics processing unit accelerated parallel algorithm for the conjugate gradient method. Experimental results show that our method can realize efficient deformation simulation of 3D elastic objects with visual reality and physical fidelity, which has a great potential for applications in computer animation.

  18. Design and Computational Fluid Dynamics Investigation of a Personal, High Flow Inhalable Sampler

    PubMed Central

    Anthony, T. Renée; Landázuri, Andrea C.; Van Dyke, Mike; Volckens, John

    2016-01-01

    The objective of this research was to develop an inlet to meet the inhalable sampling criterion at 10 l min−1 flow using the standard, 37-mm cassette. We designed a porous head for this cassette and evaluated its performance using computational fluid dynamics (CFD) modeling. Particle aspiration efficiency was simulated in a wind tunnel environment at 0.4 m s−1 freestream velocity for a facing-the-wind orientation, with sampler oriented at both 0° (horizontal) and 30° down angles. The porous high-flow sampler oriented 30° downward showed reasonable agreement with published mannequin wind tunnel studies and humanoid CFD investigations for solid particle aspiration into the mouth, whereas the horizontal orientation resulted in oversampling. Liquid particles were under-aspirated in all cases, however, with 41–84% lower aspiration efficiencies relative to solid particles. A sampler with a single central 15-mm pore at 10 l min−1 was also investigated and was found to match the porous sampler’s aspiration efficiency for solid particles; the single-pore sampler is expected to be more suitable for liquid particle use. PMID:20418278

  19. A Computational Framework for Efficient Low Temperature Plasma Simulations

    NASA Astrophysics Data System (ADS)

    Verma, Abhishek Kumar; Venkattraman, Ayyaswamy

    2016-10-01

    Over the past years, scientific computing has emerged as an essential tool for the investigation and prediction of low temperature plasmas (LTP) applications which includes electronics, nanomaterial synthesis, metamaterials etc. To further explore the LTP behavior with greater fidelity, we present a computational toolbox developed to perform LTP simulations. This framework will allow us to enhance our understanding of multiscale plasma phenomenon using high performance computing tools mainly based on OpenFOAM FVM distribution. Although aimed at microplasma simulations, the modular framework is able to perform multiscale, multiphysics simulations of physical systems comprises of LTP. Some salient introductory features are capability to perform parallel, 3D simulations of LTP applications on unstructured meshes. Performance of the solver is tested based on numerical results assessing accuracy and efficiency of benchmarks for problems in microdischarge devices. Numerical simulation of microplasma reactor at atmospheric pressure with hemispherical dielectric coated electrodes will be discussed and hence, provide an overview of applicability and future scope of this framework.

  20. Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits

    PubMed Central

    Ujfalussy, Balázs B; Makara, Judit K; Branco, Tiago; Lengyel, Máté

    2015-01-01

    Cortical neurons integrate thousands of synaptic inputs in their dendrites in highly nonlinear ways. It is unknown how these dendritic nonlinearities in individual cells contribute to computations at the level of neural circuits. Here, we show that dendritic nonlinearities are critical for the efficient integration of synaptic inputs in circuits performing analog computations with spiking neurons. We developed a theory that formalizes how a neuron's dendritic nonlinearity that is optimal for integrating synaptic inputs depends on the statistics of its presynaptic activity patterns. Based on their in vivo preynaptic population statistics (firing rates, membrane potential fluctuations, and correlations due to ensemble dynamics), our theory accurately predicted the responses of two different types of cortical pyramidal cells to patterned stimulation by two-photon glutamate uncaging. These results reveal a new computational principle underlying dendritic integration in cortical neurons by suggesting a functional link between cellular and systems--level properties of cortical circuits. DOI: http://dx.doi.org/10.7554/eLife.10056.001 PMID:26705334

  1. VOFTools - A software package of calculation tools for volume of fluid methods using general convex grids

    NASA Astrophysics Data System (ADS)

    López, J.; Hernández, J.; Gómez, P.; Faura, F.

    2018-02-01

    The VOFTools library includes efficient analytical and geometrical routines for (1) area/volume computation, (2) truncation operations that typically arise in VOF (volume of fluid) methods, (3) area/volume conservation enforcement (VCE) in PLIC (piecewise linear interface calculation) reconstruction and(4) computation of the distance from a given point to the reconstructed interface. The computation of a polyhedron volume uses an efficient formula based on a quadrilateral decomposition and a 2D projection of each polyhedron face. The analytical VCE method is based on coupling an interpolation procedure to bracket the solution with an improved final calculation step based on the above volume computation formula. Although the library was originally created to help develop highly accurate advection and reconstruction schemes in the context of VOF methods, it may have more general applications. To assess the performance of the supplied routines, different tests, which are provided in FORTRAN and C, were implemented for several 2D and 3D geometries.

  2. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    PubMed

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  3. Higher-order ice-sheet modelling accelerated by multigrid on graphics cards

    NASA Astrophysics Data System (ADS)

    Brædstrup, Christian; Egholm, David

    2013-04-01

    Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.

  4. Finite-frequency structural sensitivities of short-period compressional body waves

    NASA Astrophysics Data System (ADS)

    Fuji, Nobuaki; Chevrot, Sébastien; Zhao, Li; Geller, Robert J.; Kawai, Kenji

    2012-07-01

    We present an extension of the method recently introduced by Zhao & Chevrot for calculating Fréchet kernels from a precomputed database of strain Green's tensors by normal mode summation. The extension involves two aspects: (1) we compute the strain Green's tensors using the Direct Solution Method, which allows us to go up to frequencies as high as 1 Hz; and (2) we develop a spatial interpolation scheme so that the Green's tensors can be computed with a relatively coarse grid, thus improving the efficiency in the computation of the sensitivity kernels. The only requirement is that the Green's tensors be computed with a fine enough spatial sampling rate to avoid spatial aliasing. The Green's tensors can then be interpolated to any location inside the Earth, avoiding the need to store and retrieve strain Green's tensors for a fine sampling grid. The interpolation scheme not only significantly reduces the CPU time required to calculate the Green's tensor database and the disk space to store it, but also enhances the efficiency in computing the kernels by reducing the number of I/O operations needed to retrieve the Green's tensors. Our new implementation allows us to calculate sensitivity kernels for high-frequency teleseismic body waves with very modest computational resources such as a laptop. We illustrate the potential of our approach for seismic tomography by computing traveltime and amplitude sensitivity kernels for high frequency P, PKP and Pdiff phases. A comparison of our PKP kernels with those computed by asymptotic ray theory clearly shows the limits of the latter. With ray theory, it is not possible to model waves diffracted by internal discontinuities such as the core-mantle boundary, and it is also difficult to compute amplitudes for paths close to the B-caustic of the PKP phase. We also compute waveform partial derivatives for different parts of the seismic wavefield, a key ingredient for high resolution imaging by waveform inversion. Our computations of partial derivatives in the time window where PcP precursors are commonly observed show that the distribution of sensitivity is complex and counter-intuitive, with a large contribution from the mid-mantle region. This clearly emphasizes the need to use accurate and complete partial derivatives in waveform inversion.

  5. Aeroelastic Model Structure Computation for Envelope Expansion

    NASA Technical Reports Server (NTRS)

    Kukreja, Sunil L.

    2007-01-01

    Structure detection is a procedure for selecting a subset of candidate terms, from a full model description, that best describes the observed output. This is a necessary procedure to compute an efficient system description which may afford greater insight into the functionality of the system or a simpler controller design. Structure computation as a tool for black-box modeling may be of critical importance in the development of robust, parsimonious models for the flight-test community. Moreover, this approach may lead to efficient strategies for rapid envelope expansion that may save significant development time and costs. In this study, a least absolute shrinkage and selection operator (LASSO) technique is investigated for computing efficient model descriptions of non-linear aeroelastic systems. The LASSO minimises the residual sum of squares with the addition of an l(Sub 1) penalty term on the parameter vector of the traditional l(sub 2) minimisation problem. Its use for structure detection is a natural extension of this constrained minimisation approach to pseudo-linear regression problems which produces some model parameters that are exactly zero and, therefore, yields a parsimonious system description. Applicability of this technique for model structure computation for the F/A-18 (McDonnell Douglas, now The Boeing Company, Chicago, Illinois) Active Aeroelastic Wing project using flight test data is shown for several flight conditions (Mach numbers) by identifying a parsimonious system description with a high percent fit for cross-validated data.

  6. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vineyard, Craig Michael; Verzi, Stephen Joseph

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less

  7. Data Processing Aspects of MEDLARS

    PubMed Central

    Austin, Charles J.

    1964-01-01

    The speed and volume requirements of MEDLARS necessitate the use of high-speed data processing equipment, including paper-tape typewriters, a digital computer, and a special device for producing photo-composed output. Input to the system is of three types: variable source data, including citations from the literature and search requests; changes to such master files as the medical subject headings list and the journal record file; and operating instructions such as computer programs and procedures for machine operators. MEDLARS builds two major stores of data on magnetic tape. The Processed Citation File includes bibliographic citations in expanded form for high-quality printing at periodic intervals. The Compressed Citation File is a coded, time-sequential citation store which is used for high-speed searching against demand request input. Major design considerations include converting variable-length, alphanumeric data to mechanical form quickly and accurately; serial searching by the computer within a reasonable period of time; high-speed printing that must be of graphic quality; and efficient maintenance of various complex computer files. PMID:14119287

  8. DATA PROCESSING ASPECTS OF MEDLARS.

    PubMed

    AUSTIN, C J

    1964-01-01

    The speed and volume requirements of MEDLARS necessitate the use of high-speed data processing equipment, including paper-tape typewriters, a digital computer, and a special device for producing photo-composed output. Input to the system is of three types: variable source data, including citations from the literature and search requests; changes to such master files as the medical subject headings list and the journal record file; and operating instructions such as computer programs and procedures for machine operators. MEDLARS builds two major stores of data on magnetic tape. The Processed Citation File includes bibliographic citations in expanded form for high-quality printing at periodic intervals. The Compressed Citation File is a coded, time-sequential citation store which is used for high-speed searching against demand request input. Major design considerations include converting variable-length, alphanumeric data to mechanical form quickly and accurately; serial searching by the computer within a reasonable period of time; high-speed printing that must be of graphic quality; and efficient maintenance of various complex computer files.

  9. BESIU Physical Analysis on Hadoop Platform

    NASA Astrophysics Data System (ADS)

    Huo, Jing; Zang, Dongsong; Lei, Xiaofeng; Li, Qiang; Sun, Gongxing

    2014-06-01

    In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure.

  10. GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.

    PubMed

    Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart

    2011-06-01

    The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.

  11. Classification of breast tissue in mammograms using efficient coding.

    PubMed

    Costa, Daniel D; Campos, Lúcio F; Barros, Allan K

    2011-06-24

    Female breast cancer is the major cause of death by cancer in western countries. Efforts in Computer Vision have been made in order to improve the diagnostic accuracy by radiologists. Some methods of lesion diagnosis in mammogram images were developed based in the technique of principal component analysis which has been used in efficient coding of signals and 2D Gabor wavelets used for computer vision applications and modeling biological vision. In this work, we present a methodology that uses efficient coding along with linear discriminant analysis to distinguish between mass and non-mass from 5090 region of interest from mammograms. The results show that the best rates of success reached with Gabor wavelets and principal component analysis were 85.28% and 87.28%, respectively. In comparison, the model of efficient coding presented here reached up to 90.07%. Altogether, the results presented demonstrate that independent component analysis performed successfully the efficient coding in order to discriminate mass from non-mass tissues. In addition, we have observed that LDA with ICA bases showed high predictive performance for some datasets and thus provide significant support for a more detailed clinical investigation.

  12. Technique Developed for Optimizing Traveling-Wave Tubes

    NASA Technical Reports Server (NTRS)

    Wilson, Jeffrey D.

    1999-01-01

    A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT s are critical components in deep-space probes, geosynchronous communication satellites, and high-power radar systems. Power efficiency is of paramount importance for TWT s employed in deep-space probes and communications satellites. Consequently, increasing the power efficiency of TWT s has been the primary goal of the TWT group at the NASA Lewis Research Center over the last 25 years. An in-house effort produced a technique (ref. 1) to design TWT's for optimized power efficiency. This technique is based on simulated annealing, which has an advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 2). A simulated annealing algorithm was created and integrated into the NASA TWT computer model (ref. 3). The new technique almost doubled the computed conversion power efficiency of a TWT from 7.1 to 13.5 percent (ref. 1).

  13. Realistic and efficient 2D crack simulation

    NASA Astrophysics Data System (ADS)

    Yadegar, Jacob; Liu, Xiaoqing; Singh, Abhishek

    2010-04-01

    Although numerical algorithms for 2D crack simulation have been studied in Modeling and Simulation (M&S) and computer graphics for decades, realism and computational efficiency are still major challenges. In this paper, we introduce a high-fidelity, scalable, adaptive and efficient/runtime 2D crack/fracture simulation system by applying the mathematically elegant Peano-Cesaro triangular meshing/remeshing technique to model the generation of shards/fragments. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level-of-detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanism used for mesh element splitting and merging with minimal memory requirements essential for realistic 2D fragment formation. Upon load impact/contact/penetration, a number of factors including impact angle, impact energy, and material properties are all taken into account to produce the criteria of crack initialization, propagation, and termination leading to realistic fractal-like rubble/fragments formation. The aforementioned parameters are used as variables of probabilistic models of cracks/shards formation, making the proposed solution highly adaptive by allowing machine learning mechanisms learn the optimal values for the variables/parameters based on prior benchmark data generated by off-line physics based simulation solutions that produce accurate fractures/shards though at highly non-real time paste. Crack/fracture simulation has been conducted on various load impacts with different initial locations at various impulse scales. The simulation results demonstrate that the proposed system has the capability to realistically and efficiently simulate 2D crack phenomena (such as window shattering and shards generation) with diverse potentials in military and civil M&S applications such as training and mission planning.

  14. Efficient visibility encoding for dynamic illumination in direct volume rendering.

    PubMed

    Kronander, Joel; Jönsson, Daniel; Löw, Joakim; Ljung, Patric; Ynnerman, Anders; Unger, Jonas

    2012-03-01

    We present an algorithm that enables real-time dynamic shading in direct volume rendering using general lighting, including directional lights, point lights, and environment maps. Real-time performance is achieved by encoding local and global volumetric visibility using spherical harmonic (SH) basis functions stored in an efficient multiresolution grid over the extent of the volume. Our method enables high-frequency shadows in the spatial domain, but is limited to a low-frequency approximation of visibility and illumination in the angular domain. In a first pass, level of detail (LOD) selection in the grid is based on the current transfer function setting. This enables rapid online computation and SH projection of the local spherical distribution of visibility information. Using a piecewise integration of the SH coefficients over the local regions, the global visibility within the volume is then computed. By representing the light sources using their SH projections, the integral over lighting, visibility, and isotropic phase functions can be efficiently computed during rendering. The utility of our method is demonstrated in several examples showing the generality and interactive performance of the approach.

  15. A computational efficient modelling of laminar separation bubbles

    NASA Technical Reports Server (NTRS)

    Dini, Paolo; Maughmer, Mark D.

    1990-01-01

    In predicting the aerodynamic characteristics of airfoils operating at low Reynolds numbers, it is often important to account for the effects of laminar (transitional) separation bubbles. Previous approaches to the modelling of this viscous phenomenon range from fast but sometimes unreliable empirical correlations for the length of the bubble and the associated increase in momentum thickness, to more accurate but significantly slower displacement-thickness iteration methods employing inverse boundary-layer formulations in the separated regions. Since the penalty in computational time associated with the more general methods is unacceptable for airfoil design applications, use of an accurate yet computationally efficient model is highly desirable. To this end, a semi-empirical bubble model was developed and incorporated into the Eppler and Somers airfoil design and analysis program. The generality and the efficiency was achieved by successfully approximating the local viscous/inviscid interaction, the transition location, and the turbulent reattachment process within the framework of an integral boundary-layer method. Comparisons of the predicted aerodynamic characteristics with experimental measurements for several airfoils show excellent and consistent agreement for Reynolds numbers from 2,000,000 down to 100,000.

  16. Diverse power iteration embeddings: Theory and practice

    DOE PAGES

    Huang, Hao; Yoo, Shinjae; Yu, Dantong; ...

    2015-11-09

    Manifold learning, especially spectral embedding, is known as one of the most effective learning approaches on high dimensional data, but for real-world applications it raises a serious computational burden in constructing spectral embeddings for large datasets. To overcome this computational complexity, we propose a novel efficient embedding construction, Diverse Power Iteration Embedding (DPIE). DPIE shows almost the same effectiveness of spectral embeddings and yet is three order of magnitude faster than spectral embeddings computed from eigen-decomposition. Our DPIE is unique in that (1) it finds linearly independent embeddings and thus shows diverse aspects of dataset; (2) the proposed regularized DPIEmore » is effective if we need many embeddings; (3) we show how to efficiently orthogonalize DPIE if one needs; and (4) Diverse Power Iteration Value (DPIV) provides the importance of each DPIE like an eigen value. As a result, such various aspects of DPIE and DPIV ensure that our algorithm is easy to apply to various applications, and we also show the effectiveness and efficiency of DPIE on clustering, anomaly detection, and feature selection as our case studies.« less

  17. Feature extraction using first and second derivative extrema (FSDE) for real-time and hardware-efficient spike sorting.

    PubMed

    Paraskevopoulou, Sivylla E; Barsakcioglu, Deren Y; Saberi, Mohammed R; Eftekhar, Amir; Constandinou, Timothy G

    2013-04-30

    Next generation neural interfaces aspire to achieve real-time multi-channel systems by integrating spike sorting on chip to overcome limitations in communication channel capacity. The feasibility of this approach relies on developing highly efficient algorithms for feature extraction and clustering with the potential of low-power hardware implementation. We are proposing a feature extraction method, not requiring any calibration, based on first and second derivative features of the spike waveform. The accuracy and computational complexity of the proposed method are quantified and compared against commonly used feature extraction methods, through simulation across four datasets (with different single units) at multiple noise levels (ranging from 5 to 20% of the signal amplitude). The average classification error is shown to be below 7% with a computational complexity of 2N-3, where N is the number of sample points of each spike. Overall, this method presents a good trade-off between accuracy and computational complexity and is thus particularly well-suited for hardware-efficient implementation. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. Parallelization of a Fully-Distributed Hydrologic Model using Sub-basin Partitioning

    NASA Astrophysics Data System (ADS)

    Vivoni, E. R.; Mniszewski, S.; Fasel, P.; Springer, E.; Ivanov, V. Y.; Bras, R. L.

    2005-12-01

    A primary obstacle towards advances in watershed simulations has been the limited computational capacity available to most models. The growing trend of model complexity, data availability and physical representation has not been matched by adequate developments in computational efficiency. This situation has created a serious bottleneck which limits existing distributed hydrologic models to small domains and short simulations. In this study, we present novel developments in the parallelization of a fully-distributed hydrologic model. Our work is based on the TIN-based Real-time Integrated Basin Simulator (tRIBS), which provides continuous hydrologic simulation using a multiple resolution representation of complex terrain based on a triangulated irregular network (TIN). While the use of TINs reduces computational demand, the sequential version of the model is currently limited over large basins (>10,000 km2) and long simulation periods (>1 year). To address this, a parallel MPI-based version of the tRIBS model has been implemented and tested using high performance computing resources at Los Alamos National Laboratory. Our approach utilizes domain decomposition based on sub-basin partitioning of the watershed. A stream reach graph based on the channel network structure is used to guide the sub-basin partitioning. Individual sub-basins or sub-graphs of sub-basins are assigned to separate processors to carry out internal hydrologic computations (e.g. rainfall-runoff transformation). Routed streamflow from each sub-basin forms the major hydrologic data exchange along the stream reach graph. Individual sub-basins also share subsurface hydrologic fluxes across adjacent boundaries. We demonstrate how the sub-basin partitioning provides computational feasibility and efficiency for a set of test watersheds in northeastern Oklahoma. We compare the performance of the sequential and parallelized versions to highlight the efficiency gained as the number of processors increases. We also discuss how the coupled use of TINs and parallel processing can lead to feasible long-term simulations in regional watersheds while preserving basin properties at high-resolution.

  19. High-Efficiency High-Resolution Global Model Developments at the NASA Goddard Data Assimilation Office

    NASA Technical Reports Server (NTRS)

    Lin, Shian-Jiann; Atlas, Robert (Technical Monitor)

    2002-01-01

    The Data Assimilation Office (DAO) has been developing a new generation of ultra-high resolution General Circulation Model (GCM) that is suitable for 4-D data assimilation, numerical weather predictions, and climate simulations. These three applications have conflicting requirements. For 4-D data assimilation and weather predictions, it is highly desirable to run the model at the highest possible spatial resolution (e.g., 55 km or finer) so as to be able to resolve and predict socially and economically important weather phenomena such as tropical cyclones, hurricanes, and severe winter storms. For climate change applications, the model simulations need to be carried out for decades, if not centuries. To reduce uncertainty in climate change assessments, the next generation model would also need to be run at a fine enough spatial resolution that can at least marginally simulate the effects of intense tropical cyclones. Scientific problems (e.g., parameterization of subgrid scale moist processes) aside, all three areas of application require the model's computational performance to be dramatically improved as compared to the previous generation. In this talk, I will present the current and future developments of the "finite-volume dynamical core" at the Data Assimilation Office. This dynamical core applies modem monotonicity preserving algorithms and is genuinely conservative by construction, not by an ad hoc fixer. The "discretization" of the conservation laws is purely local, which is clearly advantageous for resolving sharp gradient flow features. In addition, the local nature of the finite-volume discretization also has a significant advantage on distributed memory parallel computers. Together with a unique vertically Lagrangian control volume discretization that essentially reduces the dimension of the computational problem from three to two, the finite-volume dynamical core is very efficient, particularly at high resolutions. I will also present the computational design of the dynamical core using a hybrid distributed-shared memory programming paradigm that is portable to virtually any of today's high-end parallel super-computing clusters.

  20. High-Efficiency High-Resolution Global Model Developments at the NASA Goddard Data Assimilation Office

    NASA Technical Reports Server (NTRS)

    Lin, Shian-Jiann; Atlas, Robert (Technical Monitor)

    2002-01-01

    The Data Assimilation Office (DAO) has been developing a new generation of ultra-high resolution General Circulation Model (GCM) that is suitable for 4-D data assimilation, numerical weather predictions, and climate simulations. These three applications have conflicting requirements. For 4-D data assimilation and weather predictions, it is highly desirable to run the model at the highest possible spatial resolution (e.g., 55 kin or finer) so as to be able to resolve and predict socially and economically important weather phenomena such as tropical cyclones, hurricanes, and severe winter storms. For climate change applications, the model simulations need to be carried out for decades, if not centuries. To reduce uncertainty in climate change assessments, the next generation model would also need to be run at a fine enough spatial resolution that can at least marginally simulate the effects of intense tropical cyclones. Scientific problems (e.g., parameterization of subgrid scale moist processes) aside, all three areas of application require the model's computational performance to be dramatically improved as compared to the previous generation. In this talk, I will present the current and future developments of the "finite-volume dynamical core" at the Data Assimilation Office. This dynamical core applies modem monotonicity preserving algorithms and is genuinely conservative by construction, not by an ad hoc fixer. The "discretization" of the conservation laws is purely local, which is clearly advantageous for resolving sharp gradient flow features. In addition, the local nature of the finite-volume discretization also has a significant advantage on distributed memory parallel computers. Together with a unique vertically Lagrangian control volume discretization that essentially reduces the dimension of the computational problem from three to two, the finite-volume dynamical core is very efficient, particularly at high resolutions. I will also present the computational design of the dynamical core using a hybrid distributed- shared memory programming paradigm that is portable to virtually any of today's high-end parallel super-computing clusters.

  1. A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

    PubMed Central

    Chen, Jui-Le; Yang, Chu-Sing

    2013-01-01

    The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864

  2. Numerical Study of Boundary Layer Interaction with Shocks: Method Improvement and Test Computation

    NASA Technical Reports Server (NTRS)

    Adams, N. A.

    1995-01-01

    The objective is the development of a high-order and high-resolution method for the direct numerical simulation of shock turbulent-boundary-layer interaction. Details concerning the spatial discretization of the convective terms can be found in Adams and Shariff (1995). The computer code based on this method as introduced in Adams (1994) was formulated in Cartesian coordinates and thus has been limited to simple rectangular domains. For more general two-dimensional geometries, as a compression corner, an extension to generalized coordinates is necessary. To keep the requirements or limitations for grid generation low, the extended formulation should allow for non-orthogonal grids. Still, for simplicity and cost efficiency, periodicity can be assumed in one cross-flow direction. For easy vectorization, the compact-ENO coupling algorithm as used in Adams (1994) treated whole planes normal to the derivative direction with the ENO scheme whenever at least one point of this plane satisfied the detection criterion. This is apparently too restrictive for more general geometries and more complex shock patterns. Here we introduce a localized compact-ENO coupling algorithm, which is efficient as long as the overall number of grid points treated by the ENO scheme is small compared to the total number of grid points. Validation and test computations with the final code are performed to assess the efficiency and suitability of the computer code for the problems of interest. We define a set of parameters where a direct numerical simulation of a turbulent boundary layer along a compression corner with reasonably fine resolution is affordable.

  3. Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis

    PubMed Central

    Teodoro, George; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

    2014-01-01

    We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). PMID:25419088

  4. Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics

    NASA Technical Reports Server (NTRS)

    Goodrich, John W.; Dyson, Rodger W.

    1999-01-01

    The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.

  5. Sampling free energy surfaces as slices by combining umbrella sampling and metadynamics.

    PubMed

    Awasthi, Shalini; Kapil, Venkat; Nair, Nisanth N

    2016-06-15

    Metadynamics (MTD) is a very powerful technique to sample high-dimensional free energy landscapes, and due to its self-guiding property, the method has been successful in studying complex reactions and conformational changes. MTD sampling is based on filling the free energy basins by biasing potentials and thus for cases with flat, broad, and unbound free energy wells, the computational time to sample them becomes very large. To alleviate this problem, we combine the standard Umbrella Sampling (US) technique with MTD to sample orthogonal collective variables (CVs) in a simultaneous way. Within this scheme, we construct the equilibrium distribution of CVs from biased distributions obtained from independent MTD simulations with umbrella potentials. Reweighting is carried out by a procedure that combines US reweighting and Tiwary-Parrinello MTD reweighting within the Weighted Histogram Analysis Method (WHAM). The approach is ideal for a controlled sampling of a CV in a MTD simulation, making it computationally efficient in sampling flat, broad, and unbound free energy surfaces. This technique also allows for a distributed sampling of a high-dimensional free energy surface, further increasing the computational efficiency in sampling. We demonstrate the application of this technique in sampling high-dimensional surface for various chemical reactions using ab initio and QM/MM hybrid molecular dynamics simulations. Further, to carry out MTD bias reweighting for computing forward reaction barriers in ab initio or QM/MM simulations, we propose a computationally affordable approach that does not require recrossing trajectories. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  6. Seismic imaging using finite-differences and parallel computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ober, C.C.

    1997-12-31

    A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less

  7. Comparison of the Computational Efficiency of the Original Versus Reformulated High-Fidelity Generalized Method of Cells

    NASA Technical Reports Server (NTRS)

    Arnold, Steven M; Bednarcyk, Brett; Aboydi, Jacob

    2004-01-01

    The High-Fidelity Generalized Method of Cells (HFGMC) micromechanics model has recently been reformulated by Bansal and Pindera (in the context of elastic phases with perfect bonding) to maximize its computational efficiency. This reformulated version of HFGMC has now been extended to include both inelastic phases and imperfect fiber-matrix bonding. The present paper presents an overview of the HFGMC theory in both its original and reformulated forms and a comparison of the results of the two implementations. The objective is to establish the correlation between the two HFGMC formulations and document the improved efficiency offered by the reformulation. The results compare the macro and micro scale predictions of the continuous reinforcement (doubly-periodic) and discontinuous reinforcement (triply-periodic) versions of both formulations into the inelastic regime, and, in the case of the discontinuous reinforcement version, with both perfect and weak interfacial bonding. The results demonstrate that identical predictions are obtained using either the original or reformulated implementations of HFGMC aside from small numerical differences in the inelastic regime due to the different implementation schemes used for the inelastic terms present in the two formulations. Finally, a direct comparison of execution times is presented for the original formulation and reformulation code implementations. It is shown that as the discretization employed in representing the composite repeating unit cell becomes increasingly refined (requiring a larger number of sub-volumes), the reformulated implementation becomes significantly (approximately an order of magnitude at best) more computationally efficient in both the continuous reinforcement (doubly-periodic) and discontinuous reinforcement (triply-periodic) cases.

  8. Enhancing the photon-extraction efficiency of site-controlled quantum dots by deterministically fabricated microlenses

    NASA Astrophysics Data System (ADS)

    Kaganskiy, Arsenty; Fischbach, Sarah; Strittmatter, André; Rodt, Sven; Heindel, Tobias; Reitzenstein, Stephan

    2018-04-01

    We report on the realization of scalable single-photon sources (SPSs) based on single site-controlled quantum dots (SCQDs) and deterministically fabricated microlenses. The fabrication process comprises the buried-stressor growth technique complemented with low-temperature in-situ electron-beam lithography for the integration of SCQDs into microlens structures with high yield and high alignment accuracy. The microlens-approach leads to a broadband enhancement of the photon-extraction efficiency of up to (21 ± 2)% and a high suppression of multi-photon events with g (2)(τ = 0) < 0.06 without background subtraction. The demonstrated combination of site-controlled growth of QDs and in-situ electron-beam lithography is relevant for arrays of efficient SPSs which, can be applied in photonic quantum circuits and advanced quantum computation schemes.

  9. EvoGraph: On-The-Fly Efficient Mining of Evolving Graphs on GPU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Song, Shuaiwen

    With the prevalence of the World Wide Web and social networks, there has been a growing interest in high performance analytics for constantly-evolving dynamic graphs. Modern GPUs provide massive AQ1 amount of parallelism for efficient graph processing, but the challenges remain due to their lack of support for the near real-time streaming nature of dynamic graphs. Specifically, due to the current high volume and velocity of graph data combined with the complexity of user queries, traditional processing methods by first storing the updates and then repeatedly running static graph analytics on a sequence of versions or snapshots are deemed undesirablemore » and computational infeasible on GPU. We present EvoGraph, a highly efficient and scalable GPU- based dynamic graph analytics framework.« less

  10. The efficient model to define a single light source position by use of high dynamic range image of 3D scene

    NASA Astrophysics Data System (ADS)

    Wang, Xu-yang; Zhdanov, Dmitry D.; Potemin, Igor S.; Wang, Ying; Cheng, Han

    2016-10-01

    One of the challenges of augmented reality is a seamless combination of objects of the real and virtual worlds, for example light sources. We suggest a measurement and computation models for reconstruction of light source position. The model is based on the dependence of luminance of the small size diffuse surface directly illuminated by point like source placed at a short distance from the observer or camera. The advantage of the computational model is the ability to eliminate the effects of indirect illumination. The paper presents a number of examples to illustrate the efficiency and accuracy of the proposed method.

  11. Small target detection using objectness and saliency

    NASA Astrophysics Data System (ADS)

    Zhang, Naiwen; Xiao, Yang; Fang, Zhiwen; Yang, Jian; Wang, Li; Li, Tao

    2017-10-01

    We are motived by the need for generic object detection algorithm which achieves high recall for small targets in complex scenes with acceptable computational efficiency. We propose a novel object detection algorithm, which has high localization quality with acceptable computational cost. Firstly, we obtain the objectness map as in BING[1] and use NMS to get the top N points. Then, k-means algorithm is used to cluster them into K classes according to their location. We set the center points of the K classes as seed points. For each seed point, an object potential region is extracted. Finally, a fast salient object detection algorithm[2] is applied to the object potential regions to highlight objectlike pixels, and a series of efficient post-processing operations are proposed to locate the targets. Our method runs at 5 FPS on 1000*1000 images, and significantly outperforms previous methods on small targets in cluttered background.

  12. A framework for parallelized efficient global optimization with application to vehicle crashworthiness optimization

    NASA Astrophysics Data System (ADS)

    Hamza, Karim; Shalaby, Mohamed

    2014-09-01

    This article presents a framework for simulation-based design optimization of computationally expensive problems, where economizing the generation of sample designs is highly desirable. One popular approach for such problems is efficient global optimization (EGO), where an initial set of design samples is used to construct a kriging model, which is then used to generate new 'infill' sample designs at regions of the search space where there is high expectancy of improvement. This article attempts to address one of the limitations of EGO, where generation of infill samples can become a difficult optimization problem in its own right, as well as allow the generation of multiple samples at a time in order to take advantage of parallel computing in the evaluation of the new samples. The proposed approach is tested on analytical functions, and then applied to the vehicle crashworthiness design of a full Geo Metro model undergoing frontal crash conditions.

  13. Efficient parallel architecture for highly coupled real-time linear system applications

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo

    1988-01-01

    A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.

  14. Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

    2014-11-11

    has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less

  15. Camera calibration method of binocular stereo vision based on OpenCV

    NASA Astrophysics Data System (ADS)

    Zhong, Wanzhen; Dong, Xiaona

    2015-10-01

    Camera calibration, an important part of the binocular stereo vision research, is the essential foundation of 3D reconstruction of the spatial object. In this paper, the camera calibration method based on OpenCV (open source computer vision library) is submitted to make the process better as a result of obtaining higher precision and efficiency. First, the camera model in OpenCV and an algorithm of camera calibration are presented, especially considering the influence of camera lens radial distortion and decentering distortion. Then, camera calibration procedure is designed to compute those parameters of camera and calculate calibration errors. High-accurate profile extraction algorithm and a checkboard with 48 corners have also been used in this part. Finally, results of calibration program are presented, demonstrating the high efficiency and accuracy of the proposed approach. The results can reach the requirement of robot binocular stereo vision.

  16. Computational efficient unsupervised coastline detection from single-polarization 1-look SAR images of complex coastal environments

    NASA Astrophysics Data System (ADS)

    Garzelli, Andrea; Zoppetti, Claudia; Pinelli, Gianpaolo

    2017-10-01

    Coastline detection in synthetic aperture radar (SAR) images is crucial in many application fields, from coastal erosion monitoring to navigation, from damage assessment to security planning for port facilities. The backscattering difference between land and sea is not always documented in SAR imagery, due to the severe speckle noise, especially in 1-look data with high spatial resolution, high sea state, or complex coastal environments. This paper presents an unsupervised, computationally efficient solution to extract the coastline acquired by only one single-polarization 1-look SAR image. Extensive tests on Spotlight COSMO-SkyMed images of complex coastal environments and objective assessment demonstrate the validity of the proposed procedure which is compared to state-of-the-art methods through visual results and with an objective evaluation of the distance between the detected and the true coastline provided by regional authorities.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacob, Reed B.; Ding, Feng; Ye, Dongmei

    Organophosphates are widely used for peaceful (agriculture) and military purposes (chemical warfare agents). The extraordinary toxicity of organophosphates and the risk of deployment, make it critical to develop means for their rapid and efficient deactivation. Organophosphate hydrolase (OPH) already plays an important role in organophosphate remediation, but is insufficient for therapeutic or prophylactic purposes primarily due to low substrate affinity. Current efforts focus on directly modifying the active site to differentiate substrate specificity and increase catalytic activity. Here, we present a novel strategy for enhancing the general catalytic efficiency of OPH through computational redesign of the residues that are allostericallymore » coupled to the active site and validated our design by mutagenesis. Specifically, we identify five such hot-spot residues for allosteric regulation and assay these mutants for hydrolysis activity against paraoxon, a chemical-weapons simulant. A high percentage of the predicted mutants exhibit enhanced activity over wild-type (k cat =16.63 s -1), such as T199I/T54I (899.5 s -1) and C227V/T199I/T54I (848 s -1), while the Km remains relatively unchanged in our high-throughput cell-free expression system. Further computational studies of protein dynamics reveal four distinct distal regions coupled to the active site that display significant changes in conformation dynamics upon these identified mutations. These results validate a computational design method that is both efficient and easily adapted as a general procedure for enzymatic enhancement.« less

  18. DGDFT: A massively parallel method for large scale density functional theory calculations.

    PubMed

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  19. Dynamic response analysis of structure under time-variant interval process model

    NASA Astrophysics Data System (ADS)

    Xia, Baizhan; Qin, Yuan; Yu, Dejie; Jiang, Chao

    2016-10-01

    Due to the aggressiveness of the environmental factor, the variation of the dynamic load, the degeneration of the material property and the wear of the machine surface, parameters related with the structure are distinctly time-variant. Typical model for time-variant uncertainties is the random process model which is constructed on the basis of a large number of samples. In this work, we propose a time-variant interval process model which can be effectively used to deal with time-variant uncertainties with limit information. And then two methods are presented for the dynamic response analysis of the structure under the time-variant interval process model. The first one is the direct Monte Carlo method (DMCM) whose computational burden is relative high. The second one is the Monte Carlo method based on the Chebyshev polynomial expansion (MCM-CPE) whose computational efficiency is high. In MCM-CPE, the dynamic response of the structure is approximated by the Chebyshev polynomials which can be efficiently calculated, and then the variational range of the dynamic response is estimated according to the samples yielded by the Monte Carlo method. To solve the dependency phenomenon of the interval operation, the affine arithmetic is integrated into the Chebyshev polynomial expansion. The computational effectiveness and efficiency of MCM-CPE is verified by two numerical examples, including a spring-mass-damper system and a shell structure.

  20. Simulating pad-electrodes with high-definition arrays in transcranial electric stimulation

    NASA Astrophysics Data System (ADS)

    Kempe, René; Huang, Yu; Parra, Lucas C.

    2014-04-01

    Objective. Research studies on transcranial electric stimulation, including direct current, often use a computational model to provide guidance on the placing of sponge-electrode pads. However, the expertise and computational resources needed for finite element modeling (FEM) make modeling impractical in a clinical setting. Our objective is to make the exploration of different electrode configurations accessible to practitioners. We provide an efficient tool to estimate current distributions for arbitrary pad configurations while obviating the need for complex simulation software. Approach. To efficiently estimate current distributions for arbitrary pad configurations we propose to simulate pads with an array of high-definition (HD) electrodes and use an efficient linear superposition to then quickly evaluate different electrode configurations. Main results. Numerical results on ten different pad configurations on a normal individual show that electric field intensity simulated with the sampled array deviates from the solutions with pads by only 5% and the locations of peak magnitude fields have a 94% overlap when using a dense array of 336 electrodes. Significance. Computationally intensive FEM modeling of the HD array needs to be performed only once, perhaps on a set of standard heads that can be made available to multiple users. The present results confirm that by using these models one can now quickly and accurately explore and select pad-electrode montages to match a particular clinical need.

Top