Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid
Computational Issues in Damping Identification for Large Scale Problems
NASA Technical Reports Server (NTRS)
Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.
1997-01-01
Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
NASA Astrophysics Data System (ADS)
Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku
2015-01-01
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.
Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.
Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi
2018-01-01
In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
Debugging a high performance computing program
Gooding, Thomas M.
2014-08-19
Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.
Debugging a high performance computing program
Gooding, Thomas M.
2013-08-20
Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aimone, James Bradley; Betty, Rita
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis, thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities.
ERIC Educational Resources Information Center
Rosenquist, Anders; Shavelson, Richard J.; Ruiz-Primo, Maria Araceli
Inconsistencies in scores from computer-simulated and "hands-on" science performance assessments have led to questions about the exchangeability of these two methods in spite of the highly touted potential of computer-simulated performance assessment. This investigation considered possible explanations for students' inconsistent performances: (1)…
ERIC Educational Resources Information Center
Chou, Huey-Wen; Wang, Yu-Fang
1999-01-01
Compares the effects of two training methods on computer attitude and performance in a World Wide Web page design program in a field experiment with high school students in Taiwan. Discusses individual differences, Kolb's Experiential Learning Theory and Learning Style Inventory, Computer Attitude Scale, and results of statistical analyses.…
Micromagnetics on high-performance workstation and mobile computational platforms
NASA Astrophysics Data System (ADS)
Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.
2015-05-01
The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Large-scale structural analysis: The structural analyst, the CSM Testbed and the NAS System
NASA Technical Reports Server (NTRS)
Knight, Norman F., Jr.; Mccleary, Susan L.; Macy, Steven C.; Aminpour, Mohammad A.
1989-01-01
The Computational Structural Mechanics (CSM) activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM testbed methods development environment is presented and some numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized.
2011-06-01
4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
Computational structural mechanics methods research using an evolving framework
NASA Technical Reports Server (NTRS)
Knight, N. F., Jr.; Lotts, C. G.; Gillian, R. E.
1990-01-01
Advanced structural analysis and computational methods that exploit high-performance computers are being developed in a computational structural mechanics research activity sponsored by the NASA Langley Research Center. These new methods are developed in an evolving framework and applied to representative complex structural analysis problems from the aerospace industry. An overview of the methods development environment is presented, and methods research areas are described. Selected application studies are also summarized.
Surrogate based wind farm layout optimization using manifold mapping
NASA Astrophysics Data System (ADS)
Kaja Kamaludeen, Shaafi M.; van Zuijle, Alexander; Bijl, Hester
2016-09-01
High computational cost associated with the high fidelity wake models such as RANS or LES serves as a primary bottleneck to perform a direct high fidelity wind farm layout optimization (WFLO) using accurate CFD based wake models. Therefore, a surrogate based multi-fidelity WFLO methodology (SWFLO) is proposed. The surrogate model is built using an SBO method referred as manifold mapping (MM). As a verification, optimization of spacing between two staggered wind turbines was performed using the proposed surrogate based methodology and the performance was compared with that of direct optimization using high fidelity model. Significant reduction in computational cost was achieved using MM: a maximum computational cost reduction of 65%, while arriving at the same optima as that of direct high fidelity optimization. The similarity between the response of models, the number of mapping points and its position, highly influences the computational efficiency of the proposed method. As a proof of concept, realistic WFLO of a small 7-turbine wind farm is performed using the proposed surrogate based methodology. Two variants of Jensen wake model with different decay coefficients were used as the fine and coarse model. The proposed SWFLO method arrived at the same optima as that of the fine model with very less number of fine model simulations.
NASA Astrophysics Data System (ADS)
Jamroz, Benjamin F.; Klöfkorn, Robert
2016-08-01
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scale simulations.
Remote control system for high-perfomance computer simulation of crystal growth by the PFC method
NASA Astrophysics Data System (ADS)
Pavlyuk, Evgeny; Starodumov, Ilya; Osipov, Sergei
2017-04-01
Modeling of crystallization process by the phase field crystal method (PFC) - one of the important directions of modern computational materials science. In this paper, the practical side of the computer simulation of the crystallization process by the PFC method is investigated. To solve problems using this method, it is necessary to use high-performance computing clusters, data storage systems and other often expensive complex computer systems. Access to such resources is often limited, unstable and accompanied by various administrative problems. In addition, the variety of software and settings of different computing clusters sometimes does not allow researchers to use unified program code. There is a need to adapt the program code for each configuration of the computer complex. The practical experience of the authors has shown that the creation of a special control system for computing with the possibility of remote use can greatly simplify the implementation of simulations and increase the performance of scientific research. In current paper we show the principal idea of such a system and justify its efficiency.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.
1990-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.
1992-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
CSM Testbed Development and Large-Scale Structural Applications
NASA Technical Reports Server (NTRS)
Knight, Norman F., Jr.; Gillian, R. E.; Mccleary, Susan L.; Lotts, C. G.; Poole, E. L.; Overman, A. L.; Macy, S. C.
1989-01-01
A research activity called Computational Structural Mechanics (CSM) conducted at the NASA Langley Research Center is described. This activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM Testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM Testbed methods development environment is presented and some new numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized.
Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Fekete, Szabolcs; Fekete, Jeno; Molnár, Imre; Ganzler, Katalin
2009-11-06
Many different strategies of reversed phase high performance liquid chromatographic (RP-HPLC) method development are used today. This paper describes a strategy for the systematic development of ultrahigh-pressure liquid chromatographic (UHPLC or UPLC) methods using 5cmx2.1mm columns packed with sub-2microm particles and computer simulation (DryLab((R)) package). Data for the accuracy of computer modeling in the Design Space under ultrahigh-pressure conditions are reported. An acceptable accuracy for these predictions of the computer models is presented. This work illustrates a method development strategy, focusing on time reduction up to a factor 3-5, compared to the conventional HPLC method development and exhibits parts of the Design Space elaboration as requested by the FDA and ICH Q8R1. Furthermore this paper demonstrates the accuracy of retention time prediction at elevated pressure (enhanced flow-rate) and shows that the computer-assisted simulation can be applied with sufficient precision for UHPLC applications (p>400bar). Examples of fast and effective method development in pharmaceutical analysis, both for gradient and isocratic separations are presented.
Multidisciplinary Design Optimization of a Full Vehicle with High Performance Computing
NASA Technical Reports Server (NTRS)
Yang, R. J.; Gu, L.; Tho, C. H.; Sobieszczanski-Sobieski, Jaroslaw
2001-01-01
Multidisciplinary design optimization (MDO) of a full vehicle under the constraints of crashworthiness, NVH (Noise, Vibration and Harshness), durability, and other performance attributes is one of the imperative goals for automotive industry. However, it is often infeasible due to the lack of computational resources, robust simulation capabilities, and efficient optimization methodologies. This paper intends to move closer towards that goal by using parallel computers for the intensive computation and combining different approximations for dissimilar analyses in the MDO process. The MDO process presented in this paper is an extension of the previous work reported by Sobieski et al. In addition to the roof crush, two full vehicle crash modes are added: full frontal impact and 50% frontal offset crash. Instead of using an adaptive polynomial response surface method, this paper employs a DOE/RSM method for exploring the design space and constructing highly nonlinear crash functions. Two NMO strategies are used and results are compared. This paper demonstrates that with high performance computing, a conventionally intractable real world full vehicle multidisciplinary optimization problem considering all performance attributes with large number of design variables become feasible.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jamroz, Benjamin F.; Klofkorn, Robert
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
Jamroz, Benjamin F.; Klofkorn, Robert
2016-08-26
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
Computing a Comprehensible Model for Spam Filtering
NASA Astrophysics Data System (ADS)
Ruiz-Sepúlveda, Amparo; Triviño-Rodriguez, José L.; Morales-Bueno, Rafael
In this paper, we describe the application of the Desicion Tree Boosting (DTB) learning model to spam email filtering.This classification task implies the learning in a high dimensional feature space. So, it is an example of how the DTB algorithm performs in such feature space problems. In [1], it has been shown that hypotheses computed by the DTB model are more comprehensible that the ones computed by another ensemble methods. Hence, this paper tries to show that the DTB algorithm maintains the same comprehensibility of hypothesis in high dimensional feature space problems while achieving the performance of other ensemble methods. Four traditional evaluation measures (precision, recall, F1 and accuracy) have been considered for performance comparison between DTB and others models usually applied to spam email filtering. The size of the hypothesis computed by a DTB is smaller and more comprehensible than the hypothesis computed by Adaboost and Naïve Bayes.
Discontinuous Galerkin Methods and High-Speed Turbulent Flows
NASA Astrophysics Data System (ADS)
Atak, Muhammed; Larsson, Johan; Munz, Claus-Dieter
2014-11-01
Discontinuous Galerkin methods gain increasing importance within the CFD community as they combine arbitrary high order of accuracy in complex geometries with parallel efficiency. Particularly the discontinuous Galerkin spectral element method (DGSEM) is a promising candidate for both the direct numerical simulation (DNS) and large eddy simulation (LES) of turbulent flows due to its excellent scaling attributes. In this talk, we present a DNS of a compressible turbulent boundary layer along a flat plate at a free-stream Mach number of M = 2.67 and assess the computational efficiency of the DGSEM at performing high-fidelity simulations of both transitional and turbulent boundary layers. We compare the accuracy of the results as well as the computational performance to results using a high order finite difference method.
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
NASA Astrophysics Data System (ADS)
Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu
2015-07-01
The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
A single-stage flux-corrected transport algorithm for high-order finite-volume methods
Chaplin, Christopher; Colella, Phillip
2017-05-08
We present a new limiter method for solving the advection equation using a high-order, finite-volume discretization. The limiter is based on the flux-corrected transport algorithm. Here, we modify the classical algorithm by introducing a new computation for solution bounds at smooth extrema, as well as improving the preconstraint on the high-order fluxes. We compute the high-order fluxes via a method-of-lines approach with fourth-order Runge-Kutta as the time integrator. For computing low-order fluxes, we select the corner-transport upwind method due to its improved stability over donor-cell upwind. Several spatial differencing schemes are investigated for the high-order flux computation, including centered- differencemore » and upwind schemes. We show that the upwind schemes perform well on account of the dissipation of high-wavenumber components. The new limiter method retains high-order accuracy for smooth solutions and accurately captures fronts in discontinuous solutions. Further, we need only apply the limiter once per complete time step.« less
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
DNS of Flow in a Low-Pressure Turbine Cascade Using a Discontinuous-Galerkin Spectral-Element Method
NASA Technical Reports Server (NTRS)
Garai, Anirban; Diosady, Laslo Tibor; Murman, Scott; Madavan, Nateri
2015-01-01
A new computational capability under development for accurate and efficient high-fidelity direct numerical simulation (DNS) and large eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy-stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy and is implemented in a computationally efficient manner on a modern high performance computer architecture. A validation study using this method to perform DNS of flow in a low-pressure turbine airfoil cascade are presented. Preliminary results indicate that the method captures the main features of the flow. Discrepancies between the predicted results and the experiments are likely due to the effects of freestream turbulence not being included in the simulation and will be addressed in the final paper.
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-01-01
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-02-12
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing
Keane, Thomas M.; Naughton, Thomas J.; McInerney, James O.
2007-01-01
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php. PMID:17553837
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Levman, Jacob E D; Gallego-Ortiz, Cristina; Warner, Ellen; Causer, Petrina; Martel, Anne L
2016-02-01
Magnetic resonance imaging (MRI)-enabled cancer screening has been shown to be a highly sensitive method for the early detection of breast cancer. Computer-aided detection systems have the potential to improve the screening process by standardizing radiologists to a high level of diagnostic accuracy. This retrospective study was approved by the institutional review board of Sunnybrook Health Sciences Centre. This study compares the performance of a proposed method for computer-aided detection (based on the second-order spatial derivative of the relative signal intensity) with the signal enhancement ratio (SER) on MRI-based breast screening examinations. Comparison is performed using receiver operating characteristic (ROC) curve analysis as well as free-response receiver operating characteristic (FROC) curve analysis. A modified computer-aided detection system combining the proposed approach with the SER method is also presented. The proposed method provides improvements in the rates of false positive markings over the SER method in the detection of breast cancer (as assessed by FROC analysis). The modified computer-aided detection system that incorporates both the proposed method and the SER method yields ROC results equal to that produced by SER while simultaneously providing improvements over the SER method in terms of false positives per noncancerous exam. The proposed method for identifying malignancies outperforms the SER method in terms of false positives on a challenging dataset containing many small lesions and may play a useful role in breast cancer screening by MRI as part of a computer-aided detection system.
Testing and Validation of Computational Methods for Mass Spectrometry.
Gatto, Laurent; Hansen, Kasper D; Hoopmann, Michael R; Hermjakob, Henning; Kohlbacher, Oliver; Beyer, Andreas
2016-03-04
High-throughput methods based on mass spectrometry (proteomics, metabolomics, lipidomics, etc.) produce a wealth of data that cannot be analyzed without computational methods. The impact of the choice of method on the overall result of a biological study is often underappreciated, but different methods can result in very different biological findings. It is thus essential to evaluate and compare the correctness and relative performance of computational methods. The volume of the data as well as the complexity of the algorithms render unbiased comparisons challenging. This paper discusses some problems and challenges in testing and validation of computational methods. We discuss the different types of data (simulated and experimental validation data) as well as different metrics to compare methods. We also introduce a new public repository for mass spectrometric reference data sets ( http://compms.org/RefData ) that contains a collection of publicly available data sets for performance evaluation for a wide range of different methods.
Templet Web: the use of volunteer computing approach in PaaS-style cloud
NASA Astrophysics Data System (ADS)
Vostokin, Sergei; Artamonov, Yuriy; Tsarev, Daniil
2018-03-01
This article presents the Templet Web cloud service. The service is designed for high-performance scientific computing automation. The use of high-performance technology is specifically required by new fields of computational science such as data mining, artificial intelligence, machine learning, and others. Cloud technologies provide a significant cost reduction for high-performance scientific applications. The main objectives to achieve this cost reduction in the Templet Web service design are: (a) the implementation of "on-demand" access; (b) source code deployment management; (c) high-performance computing programs development automation. The distinctive feature of the service is the approach mainly used in the field of volunteer computing, when a person who has access to a computer system delegates his access rights to the requesting user. We developed an access procedure, algorithms, and software for utilization of free computational resources of the academic cluster system in line with the methods of volunteer computing. The Templet Web service has been in operation for five years. It has been successfully used for conducting laboratory workshops and solving research problems, some of which are considered in this article. The article also provides an overview of research directions related to service development.
Bethel, EW; Bauer, A; Abbasi, H; ...
2016-06-10
The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed /visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitionersmore » using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.« less
Computational efficiency for the surface renewal method
NASA Astrophysics Data System (ADS)
Kelley, Jason; Higgins, Chad
2018-04-01
Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.
NASA Technical Reports Server (NTRS)
Warner, James E.; Zubair, Mohammad; Ranjan, Desh
2017-01-01
This work investigates novel approaches to probabilistic damage diagnosis that utilize surrogate modeling and high performance computing (HPC) to achieve substantial computational speedup. Motivated by Digital Twin, a structural health management (SHM) paradigm that integrates vehicle-specific characteristics with continual in-situ damage diagnosis and prognosis, the methods studied herein yield near real-time damage assessments that could enable monitoring of a vehicle's health while it is operating (i.e. online SHM). High-fidelity modeling and uncertainty quantification (UQ), both critical to Digital Twin, are incorporated using finite element method simulations and Bayesian inference, respectively. The crux of the proposed Bayesian diagnosis methods, however, is the reformulation of the numerical sampling algorithms (e.g. Markov chain Monte Carlo) used to generate the resulting probabilistic damage estimates. To this end, three distinct methods are demonstrated for rapid sampling that utilize surrogate modeling and exploit various degrees of parallelism for leveraging HPC. The accuracy and computational efficiency of the methods are compared on the problem of strain-based crack identification in thin plates. While each approach has inherent problem-specific strengths and weaknesses, all approaches are shown to provide accurate probabilistic damage diagnoses and several orders of magnitude computational speedup relative to a baseline Bayesian diagnosis implementation.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Modeling Materials: Design for Planetary Entry, Electric Aircraft, and Beyond
NASA Technical Reports Server (NTRS)
Thompson, Alexander; Lawson, John W.
2014-01-01
NASA missions push the limits of what is possible. The development of high-performance materials must keep pace with the agency's demanding, cutting-edge applications. Researchers at NASA's Ames Research Center are performing multiscale computational modeling to accelerate development times and further the design of next-generation aerospace materials. Multiscale modeling combines several computationally intensive techniques ranging from the atomic level to the macroscale, passing output from one level as input to the next level. These methods are applicable to a wide variety of materials systems. For example: (a) Ultra-high-temperature ceramics for hypersonic aircraft-we utilized the full range of multiscale modeling to characterize thermal protection materials for faster, safer air- and spacecraft, (b) Planetary entry heat shields for space vehicles-we computed thermal and mechanical properties of ablative composites by combining several methods, from atomistic simulations to macroscale computations, (c) Advanced batteries for electric aircraft-we performed large-scale molecular dynamics simulations of advanced electrolytes for ultra-high-energy capacity batteries to enable long-distance electric aircraft service; and (d) Shape-memory alloys for high-efficiency aircraft-we used high-fidelity electronic structure calculations to determine phase diagrams in shape-memory transformations. Advances in high-performance computing have been critical to the development of multiscale materials modeling. We used nearly one million processor hours on NASA's Pleiades supercomputer to characterize electrolytes with a fidelity that would be otherwise impossible. For this and other projects, Pleiades enables us to push the physics and accuracy of our calculations to new levels.
Non-volatile memory for checkpoint storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.
A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less
Acceleration of FDTD mode solver by high-performance computing techniques.
Han, Lin; Xi, Yanping; Huang, Wei-Ping
2010-06-21
A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
NASA Astrophysics Data System (ADS)
Tsuboi, S.; Miyoshi, T.; Obayashi, M.; Tono, Y.; Ando, K.
2014-12-01
Recent progress in large scale computing by using waveform modeling technique and high performance computing facility has demonstrated possibilities to perform full-waveform inversion of three dimensional (3D) seismological structure inside the Earth. We apply the adjoint method (Liu and Tromp, 2006) to obtain 3D structure beneath Japanese Islands. First we implemented Spectral-Element Method to K-computer in Kobe, Japan. We have optimized SPECFEM3D_GLOBE (Komatitsch and Tromp, 2002) by using OpenMP so that the code fits hybrid architecture of K-computer. Now we could use 82,134 nodes of K-computer (657,072 cores) to compute synthetic waveform with about 1 sec accuracy for realistic 3D Earth model and its performance was 1.2 PFLOPS. We use this optimized SPECFEM3D_GLOBE code and take one chunk around Japanese Islands from global mesh and compute synthetic seismograms with accuracy of about 10 second. We use GAP-P2 mantle tomography model (Obayashi et al., 2009) as an initial 3D model and use as many broadband seismic stations available in this region as possible to perform inversion. We then use the time windows for body waves and surface waves to compute adjoint sources and calculate adjoint kernels for seismic structure. We have performed several iteration and obtained improved 3D structure beneath Japanese Islands. The result demonstrates that waveform misfits between observed and theoretical seismograms improves as the iteration proceeds. We now prepare to use much shorter period in our synthetic waveform computation and try to obtain seismic structure for basin scale model, such as Kanto basin, where there are dense seismic network and high seismic activity. Acknowledgements: This research was partly supported by MEXT Strategic Program for Innovative Research. We used F-net seismograms of the National Research Institute for Earth Science and Disaster Prevention.
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Liu, Mali; Lu, Chihao; Li, Haifeng; Liu, Xu
2018-02-19
We propose a bifocal computational near eye light field display (bifocal computational display) and structure parameters determination scheme (SPDS) for bifocal computational display that achieves greater depth of field (DOF), high resolution, accommodation and compact form factor. Using a liquid varifocal lens, two single-focal computational light fields are superimposed to reconstruct a virtual object's light field by time multiplex and avoid the limitation on high refresh rate. By minimizing the deviation between reconstructed light field and original light field, we propose a determination framework to determine the structure parameters of bifocal computational light field display. When applied to different objective to SPDS, it can achieve high average resolution or uniform resolution display over scene depth range. To analyze the advantages and limitation of our proposed method, we have conducted simulations and constructed a simple prototype which comprises a liquid varifocal lens, dual-layer LCDs and a uniform backlight. The results of simulation and experiments with our method show that the proposed system can achieve expected performance well. Owing to the excellent performance of our system, we motivate bifocal computational display and SPDS to contribute to a daily-use and commercial virtual reality display.
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
Qayumi, A K; Kurihara, Y; Imai, M; Pachev, G; Seo, H; Hoshino, Y; Cheifetz, R; Matsuura, K; Momoi, M; Saleem, M; Lara-Guerra, H; Miki, Y; Kariya, Y
2004-10-01
This study aimed to compare the effects of computer-assisted, text-based and computer-and-text learning conditions on the performances of 3 groups of medical students in the pre-clinical years of their programme, taking into account their academic achievement to date. A fourth group of students served as a control (no-study) group. Participants were recruited from the pre-clinical years of the training programmes in 2 medical schools in Japan, Jichi Medical School near Tokyo and Kochi Medical School near Osaka. Participants were randomly assigned to 4 learning conditions and tested before and after the study on their knowledge of and skill in performing an abdominal examination, in a multiple-choice test and an objective structured clinical examination (OSCE), respectively. Information about performance in the programme was collected from school records and students were classified as average, good or excellent. Student and faculty evaluations of their experience in the study were explored by means of a short evaluation survey. Compared to the control group, all 3 study groups exhibited significant gains in performance on knowledge and performance measures. For the knowledge measure, the gains of the computer-assisted and computer-assisted plus text-based learning groups were significantly greater than the gains of the text-based learning group. The performances of the 3 groups did not differ on the OSCE measure. Analyses of gains by performance level revealed that high achieving students' learning was independent of study method. Lower achieving students performed better after using computer-based learning methods. The results suggest that computer-assisted learning methods will be of greater help to students who do not find the traditional methods effective. Explorations of the factors behind this are a matter for future research.
Study of high-performance canonical molecular orbitals calculation for proteins
NASA Astrophysics Data System (ADS)
Hirano, Toshiyuki; Sato, Fumitoshi
2017-11-01
The canonical molecular orbital (CMO) calculation can help to understand chemical properties and reactions in proteins. However, it is difficult to perform the CMO calculation of proteins because of its self-consistent field (SCF) convergence problem and expensive computational cost. To certainly obtain the CMO of proteins, we work in research and development of high-performance CMO applications and perform experimental studies. We have proposed the third-generation density-functional calculation method of calculating the SCF, which is more advanced than the FILE and direct method. Our method is based on Cholesky decomposition for two-electron integrals calculation and the modified grid-free method for the pure-XC term evaluation. By using the third-generation density-functional calculation method, the Coulomb, the Fock-exchange, and the pure-XC terms can be given by simple linear algebraic procedure in the SCF loop. Therefore, we can expect to get a good parallel performance in solving the SCF problem by using a well-optimized linear algebra library such as BLAS on the distributed memory parallel computers. The third-generation density-functional calculation method is implemented to our program, ProteinDF. To achieve computing electronic structure of the large molecule, not only overcoming expensive computation cost and also good initial guess for safe SCF convergence are required. In order to prepare a precise initial guess for the macromolecular system, we have developed the quasi-canonical localized orbital (QCLO) method. The QCLO has the characteristics of both localized and canonical orbital in a certain region of the molecule. We have succeeded in the CMO calculations of proteins by using the QCLO method. For simplified and semi-automated calculation of the QCLO method, we have also developed a Python-based program, QCLObot.
Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan
2017-01-01
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan
2017-08-04
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.
We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less
NASA Astrophysics Data System (ADS)
Blakely, Christopher D.
This dissertation thesis has three main goals: (1) To explore the anatomy of meshless collocation approximation methods that have recently gained attention in the numerical analysis community; (2) Numerically demonstrate why the meshless collocation method should clearly become an attractive alternative to standard finite-element methods due to the simplicity of its implementation and its high-order convergence properties; (3) Propose a meshless collocation method for large scale computational geophysical fluid dynamics models. We provide numerical verification and validation of the meshless collocation scheme applied to the rotational shallow-water equations on the sphere and demonstrate computationally that the proposed model can compete with existing high performance methods for approximating the shallow-water equations such as the SEAM (spectral-element atmospheric model) developed at NCAR. A detailed analysis of the parallel implementation of the model, along with the introduction of parallel algorithmic routines for the high-performance simulation of the model will be given. We analyze the programming and computational aspects of the model using Fortran 90 and the message passing interface (mpi) library along with software and hardware specifications and performance tests. Details from many aspects of the implementation in regards to performance, optimization, and stabilization will be given. In order to verify the mathematical correctness of the algorithms presented and to validate the performance of the meshless collocation shallow-water model, we conclude the thesis with numerical experiments on some standardized test cases for the shallow-water equations on the sphere using the proposed method.
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Reducing Communication in Algebraic Multigrid Using Additive Variants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vassilevski, Panayot S.; Yang, Ulrike Meier
Algebraic multigrid (AMG) has proven to be an effective scalable solver on many high performance computers. However, its increasing communication complexity on coarser levels has shown to seriously impact its performance on computers with high communication cost. Moreover, additive AMG variants provide not only increased parallelism as well as decreased numbers of messages per cycle but also generally exhibit slower convergence. Here we present various new additive variants with convergence rates that are significantly improved compared to the classical additive algebraic multigrid method and investigate their potential for decreased communication, and improved communication-computation overlap, features that are essential for goodmore » performance on future exascale architectures.« less
Reducing Communication in Algebraic Multigrid Using Additive Variants
Vassilevski, Panayot S.; Yang, Ulrike Meier
2014-02-12
Algebraic multigrid (AMG) has proven to be an effective scalable solver on many high performance computers. However, its increasing communication complexity on coarser levels has shown to seriously impact its performance on computers with high communication cost. Moreover, additive AMG variants provide not only increased parallelism as well as decreased numbers of messages per cycle but also generally exhibit slower convergence. Here we present various new additive variants with convergence rates that are significantly improved compared to the classical additive algebraic multigrid method and investigate their potential for decreased communication, and improved communication-computation overlap, features that are essential for goodmore » performance on future exascale architectures.« less
Methods, apparatus and system for selective duplication of subtasks
Andrade Costa, Carlos H.; Cher, Chen-Yong; Park, Yoonho; Rosenburg, Bryan S.; Ryu, Kyung D.
2016-03-29
A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.
Prediction and characterization of application power use in a high-performance computing environment
Bugbee, Bruce; Phillips, Caleb; Egan, Hilary; ...
2017-02-27
Power use in data centers and high-performance computing (HPC) facilities has grown in tandem with increases in the size and number of these facilities. Substantial innovation is needed to enable meaningful reduction in energy footprints in leadership-class HPC systems. In this paper, we focus on characterizing and investigating application-level power usage. We demonstrate potential methods for predicting power usage based on a priori and in situ characteristics. Lastly, we highlight a potential use case of this method through a simulated power-aware scheduler using historical jobs from a real scientific HPC system.
Computer Facilitated Mathematical Methods in Chemical Engineering--Similarity Solution
ERIC Educational Resources Information Center
Subramanian, Venkat R.
2006-01-01
High-performance computers coupled with highly efficient numerical schemes and user-friendly software packages have helped instructors to teach numerical solutions and analysis of various nonlinear models more efficiently in the classroom. One of the main objectives of a model is to provide insight about the system of interest. Analytical…
Tools for 3D scientific visualization in computational aerodynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
A vectorized Lanczos eigensolver for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1990-01-01
The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
High-Performance Computing and Visualization | Energy Systems Integration
Facility | NREL High-Performance Computing and Visualization High-Performance Computing and Visualization High-performance computing (HPC) and visualization at NREL propel technology innovation as a . Capabilities High-Performance Computing NREL is home to Peregrine-the largest high-performance computing system
Computational Fluid Dynamics Simulation Study of Active Power Control in Wind Plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleming, Paul; Aho, Jake; Gebraad, Pieter
2016-08-01
This paper presents an analysis performed on a wind plant's ability to provide active power control services using a high-fidelity computational fluid dynamics-based wind plant simulator. This approach allows examination of the impact on wind turbine wake interactions within a wind plant on performance of the wind plant controller. The paper investigates several control methods for improving performance in waked conditions. One method uses wind plant wake controls, an active field of research in which wind turbine control systems are coordinated to account for their wakes, to improve the overall performance. Results demonstrate the challenge of providing active power controlmore » in waked conditions but also the potential methods for improving this performance.« less
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA
NASA Astrophysics Data System (ADS)
Ren, Qinlong
Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.
High performance in silico virtual drug screening on many-core processors.
McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-05-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
High performance in silico virtual drug screening on many-core processors
Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-01-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
NASA Astrophysics Data System (ADS)
Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Ha, Dong-Gwang; Einzinger, Markus; Wu, Tony; Baldo, Marc A.; Aspuru-Guzik, Alán.
2016-09-01
Discovering new OLED emitters requires many experiments to synthesize candidates and test performance in devices. Large scale computer simulation can greatly speed this search process but the problem remains challenging enough that brute force application of massive computing power is not enough to successfully identify novel structures. We report a successful High Throughput Virtual Screening study that leveraged a range of methods to optimize the search process. The generation of candidate structures was constrained to contain combinatorial explosion. Simulations were tuned to the specific problem and calibrated with experimental results. Experimentalists and theorists actively collaborated such that experimental feedback was regularly utilized to update and shape the computational search. Supervised machine learning methods prioritized candidate structures prior to quantum chemistry simulation to prevent wasting compute on likely poor performers. With this combination of techniques, each multiplying the strength of the search, this effort managed to navigate an area of molecular space and identify hundreds of promising OLED candidate structures. An experimentally validated selection of this set shows emitters with external quantum efficiencies as high as 22%.
Analysis of high-throughput biological data using their rank values.
Dembélé, Doulaye
2018-01-01
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Rapid indirect trajectory optimization on highly parallel computing architectures
NASA Astrophysics Data System (ADS)
Antony, Thomas
Trajectory optimization is a field which can benefit greatly from the advantages offered by parallel computing. The current state-of-the-art in trajectory optimization focuses on the use of direct optimization methods, such as the pseudo-spectral method. These methods are favored due to their ease of implementation and large convergence regions while indirect methods have largely been ignored in the literature in the past decade except for specific applications in astrodynamics. It has been shown that the shortcomings conventionally associated with indirect methods can be overcome by the use of a continuation method in which complex trajectory solutions are obtained by solving a sequence of progressively difficult optimization problems. High performance computing hardware is trending towards more parallel architectures as opposed to powerful single-core processors. Graphics Processing Units (GPU), which were originally developed for 3D graphics rendering have gained popularity in the past decade as high-performance, programmable parallel processors. The Compute Unified Device Architecture (CUDA) framework, a parallel computing architecture and programming model developed by NVIDIA, is one of the most widely used platforms in GPU computing. GPUs have been applied to a wide range of fields that require the solution of complex, computationally demanding problems. A GPU-accelerated indirect trajectory optimization methodology which uses the multiple shooting method and continuation is developed using the CUDA platform. The various algorithmic optimizations used to exploit the parallelism inherent in the indirect shooting method are described. The resulting rapid optimal control framework enables the construction of high quality optimal trajectories that satisfy problem-specific constraints and fully satisfy the necessary conditions of optimality. The benefits of the framework are highlighted by construction of maximum terminal velocity trajectories for a hypothetical long range weapon system. The techniques used to construct an initial guess from an analytic near-ballistic trajectory and the methods used to formulate the necessary conditions of optimality in a manner that is transparent to the designer are discussed. Various hypothetical mission scenarios that enforce different combinations of initial, terminal, interior point and path constraints demonstrate the rapid construction of complex trajectories without requiring any a-priori insight into the structure of the solutions. Trajectory problems of this kind were previously considered impractical to solve using indirect methods. The performance of the GPU-accelerated solver is found to be 2x--4x faster than MATLAB's bvp4c, even while running on GPU hardware that is five years behind the state-of-the-art.
X-ray computed tomography using curvelet sparse regularization.
Wieczorek, Matthias; Frikel, Jürgen; Vogel, Jakob; Eggl, Elena; Kopp, Felix; Noël, Peter B; Pfeiffer, Franz; Demaret, Laurent; Lasser, Tobias
2015-04-01
Reconstruction of x-ray computed tomography (CT) data remains a mathematically challenging problem in medical imaging. Complementing the standard analytical reconstruction methods, sparse regularization is growing in importance, as it allows inclusion of prior knowledge. The paper presents a method for sparse regularization based on the curvelet frame for the application to iterative reconstruction in x-ray computed tomography. In this work, the authors present an iterative reconstruction approach based on the alternating direction method of multipliers using curvelet sparse regularization. Evaluation of the method is performed on a specifically crafted numerical phantom dataset to highlight the method's strengths. Additional evaluation is performed on two real datasets from commercial scanners with different noise characteristics, a clinical bone sample acquired in a micro-CT and a human abdomen scanned in a diagnostic CT. The results clearly illustrate that curvelet sparse regularization has characteristic strengths. In particular, it improves the restoration and resolution of highly directional, high contrast features with smooth contrast variations. The authors also compare this approach to the popular technique of total variation and to traditional filtered backprojection. The authors conclude that curvelet sparse regularization is able to improve reconstruction quality by reducing noise while preserving highly directional features.
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; Russell, Samuel S.
2012-01-01
Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
High performance computation of radiative transfer equation using the finite element method
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.
2018-05-01
This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
Experimental Investigation of Project Orion Crew Exploration Vehicle Aeroheating in AEDC Tunnel 9
NASA Technical Reports Server (NTRS)
Hollis, Brian R.; Horvath, Thomas J.; Berger, Karen T.; Lillard, Randolph P.; Kirk, Benjamin S.; Coblish, Joseph J.; Norris, Joseph D.
2008-01-01
An investigation of the aeroheating environment of the Project Orion Crew Entry Vehicle has been performed in the Arnold Engineering Development Center Tunnel 9. The goals of this test were to measure turbulent heating augmentation levels on the heat shield and to obtain high-fidelity heating data for assessment of computational fluid dynamics methods. Laminar and turbulent predictions were generated for all wind tunnel test conditions and comparisons were performed with the data for the purpose of helping to define uncertainty margins for the computational method. Data from both the wind tunnel test and the computational study are presented herein.
Parallel high-precision orbit propagation using the modified Picard-Chebyshev method
NASA Astrophysics Data System (ADS)
Koblick, Darin C.
2012-03-01
The modified Picard-Chebyshev method, when run in parallel, is thought to be more accurate and faster than the most efficient sequential numerical integration techniques when applied to orbit propagation problems. Previous experiments have shown that the modified Picard-Chebyshev method can have up to a one order magnitude speedup over the 12
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prowell, Stacy J; Symons, Christopher T
2015-01-01
Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.
Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiu, Dongbin
2017-03-03
The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
ERIC Educational Resources Information Center
Owusu, K. A.; Monney, K. A.; Appiah, J. Y.; Wilmot, E. M.
2010-01-01
This study investigated the comparative efficiency of computer-assisted instruction (CAI) and conventional teaching method in biology on senior high school students. A science class was selected in each of two randomly selected schools. The pretest-posttest non equivalent quasi experimental design was used. The students in the experimental group…
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...
2017-08-29
Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott
Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid
2014-12-30
Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
DAKOTA Design Analysis Kit for Optimization and Terascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Brian M.; Dalbey, Keith R.; Eldred, Michael S.
2010-02-24
The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes (computational models) and iterative analysis methods. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a flexible and extensible problem-solving environment for design and analysis of computational models on high performance computers.A user provides a set of DAKOTA commands in an input file and launches DAKOTA. DAKOTA invokes instances of the computational models, collects their results, and performs systems analyses. DAKOTA contains algorithms for optimization with gradient and nongradient-basedmore » methods; uncertainty quantification with sampling, reliability, polynomial chaos, stochastic collocation, and epistemic methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as hybrid optimization, surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. Services for parallel computing, simulation interfacing, approximation modeling, fault tolerance, restart, and graphics are also included.« less
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Final report for “Extreme-scale Algorithms and Solver Resilience”
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2017-06-30
This is a joint project with principal investigators at Oak Ridge National Laboratory, Sandia National Laboratories, the University of California at Berkeley, and the University of Tennessee. Our part of the project involves developing performance models for highly scalable algorithms and the development of latency tolerant iterative methods. During this project, we extended our performance models for the Multigrid method for solving large systems of linear equations and conducted experiments with highly scalable variants of conjugate gradient methods that avoid blocking synchronization. In addition, we worked with the other members of the project on alternative techniques for resilience and reproducibility.more » We also presented an alternative approach for reproducible dot-products in parallel computations that performs almost as well as the conventional approach by separating the order of computation from the details of the decomposition of vectors across the processes.« less
NASA Technical Reports Server (NTRS)
Scheper, C.; Baker, R.; Frank, G.; Yalamanchili, S.; Gray, G.
1992-01-01
Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified.
High-Performance Computing Data Center | Energy Systems Integration
Facility | NREL High-Performance Computing Data Center High-Performance Computing Data Center The Energy Systems Integration Facility's High-Performance Computing Data Center is home to Peregrine -the largest high-performance computing system in the world exclusively dedicated to advancing
Fan, Ming; Kuwahara, Hiroyuki; Wang, Xiaolei; Wang, Suojin; Gao, Xin
2015-11-01
Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Compression of hyper-spectral images using an accelerated nonnegative tensor decomposition
NASA Astrophysics Data System (ADS)
Li, Jin; Liu, Zilong
2017-12-01
Nonnegative tensor Tucker decomposition (NTD) in a transform domain (e.g., 2D-DWT, etc) has been used in the compression of hyper-spectral images because it can remove redundancies between spectrum bands and also exploit spatial correlations of each band. However, the use of a NTD has a very high computational cost. In this paper, we propose a low complexity NTD-based compression method of hyper-spectral images. This method is based on a pair-wise multilevel grouping approach for the NTD to overcome its high computational cost. The proposed method has a low complexity under a slight decrease of the coding performance compared to conventional NTD. We experimentally confirm this method, which indicates that this method has the less processing time and keeps a better coding performance than the case that the NTD is not used. The proposed approach has a potential application in the loss compression of hyper-spectral or multi-spectral images
Reverse time migration by Krylov subspace reduced order modeling
NASA Astrophysics Data System (ADS)
Basir, Hadi Mahdavi; Javaherian, Abdolrahim; Shomali, Zaher Hossein; Firouz-Abadi, Roohollah Dehghani; Gholamy, Shaban Ali
2018-04-01
Imaging is a key step in seismic data processing. To date, a myriad of advanced pre-stack depth migration approaches have been developed; however, reverse time migration (RTM) is still considered as the high-end imaging algorithm. The main limitations associated with the performance cost of reverse time migration are the intensive computation of the forward and backward simulations, time consumption, and memory allocation related to imaging condition. Based on the reduced order modeling, we proposed an algorithm, which can be adapted to all the aforementioned factors. Our proposed method benefit from Krylov subspaces method to compute certain mode shapes of the velocity model computed by as an orthogonal base of reduced order modeling. Reverse time migration by reduced order modeling is helpful concerning the highly parallel computation and strongly reduces the memory requirement of reverse time migration. The synthetic model results showed that suggested method can decrease the computational costs of reverse time migration by several orders of magnitudes, compared with reverse time migration by finite element method.
Computer-generated holograms by multiple wavefront recording plane method with occlusion culling.
Symeonidou, Athanasia; Blinder, David; Munteanu, Adrian; Schelkens, Peter
2015-08-24
We propose a novel fast method for full parallax computer-generated holograms with occlusion processing, suitable for volumetric data such as point clouds. A novel light wave propagation strategy relying on the sequential use of the wavefront recording plane method is proposed, which employs look-up tables in order to reduce the computational complexity in the calculation of the fields. Also, a novel technique for occlusion culling with little additional computation cost is introduced. Additionally, the method adheres a Gaussian distribution to the individual points in order to improve visual quality. Performance tests show that for a full-parallax high-definition CGH a speedup factor of more than 2,500 compared to the ray-tracing method can be achieved without hardware acceleration.
Parallel-vector unsymmetric Eigen-Solver on high performance computers
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Jiangning, Qin
1993-01-01
The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Computational design and experimental validation of new thermal barrier systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Shengmin
2015-03-31
The focus of this project is on the development of a reliable and efficient ab initio based computational high temperature material design method which can be used to assist the Thermal Barrier Coating (TBC) bond-coat and top-coat design. Experimental evaluations on the new TBCs are conducted to confirm the new TBCs’ properties. Southern University is the subcontractor on this project with a focus on the computational simulation method development. We have performed ab initio density functional theory (DFT) method and molecular dynamics simulation on screening the top coats and bond coats for gas turbine thermal barrier coating design and validationmore » applications. For experimental validations, our focus is on the hot corrosion performance of different TBC systems. For example, for one of the top coatings studied, we examined the thermal stability of TaZr 2.75O 8 and confirmed it’s hot corrosion performance.« less
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brun, B.
1997-07-01
Computer technology has improved tremendously during the last years with larger media capacity, memory and more computational power. Visual computing with high-performance graphic interface and desktop computational power have changed the way engineers accomplish everyday tasks, development and safety studies analysis. The emergence of parallel computing will permit simulation over a larger domain. In addition, new development methods, languages and tools have appeared in the last several years.
Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)
NASA Technical Reports Server (NTRS)
Dalton, Shelly D.; Daley, Philip C.
1988-01-01
As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
Automated Parameter Studies Using a Cartesian Method
NASA Technical Reports Server (NTRS)
Murman, Scott M.; Aftosimis, Michael J.; Nemec, Marian
2004-01-01
Computational Fluid Dynamics (CFD) is now routinely used to analyze isolated points in a design space by performing steady-state computations at fixed flight conditions (Mach number, angle of attack, sideslip), for a fixed geometric configuration of interest. This "point analysis" provides detailed information about the flowfield, which aides an engineer in understanding, or correcting, a design. A point analysis is typically performed using high fidelity methods at a handful of critical design points, e.g. a cruise or landing configuration, or a sample of points along a flight trajectory.
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Network support for system initiated checkpoints
Chen, Dong; Heidelberger, Philip
2013-01-29
A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
Using Q-Chem on the Peregrine System | High-Performance Computing | NREL
initio quantum chemistry package with special strengths in excited state methods, non-adiabatic coupling , solvation models, explicitly correlated wavefunction methods, and cutting-edge DFT. Running Q-Chem on
Nakatsui, M; Horimoto, K; Lemaire, F; Ürgüplü, A; Sedoglavic, A; Boulier, F
2011-09-01
Recent remarkable advances in computer performance have enabled us to estimate parameter values by the huge power of numerical computation, the so-called 'Brute force', resulting in the high-speed simultaneous estimation of a large number of parameter values. However, these advancements have not been fully utilised to improve the accuracy of parameter estimation. Here the authors review a novel method for parameter estimation using symbolic computation power, 'Bruno force', named after Bruno Buchberger, who found the Gröbner base. In the method, the objective functions combining the symbolic computation techniques are formulated. First, the authors utilise a symbolic computation technique, differential elimination, which symbolically reduces an equivalent system of differential equations to a system in a given model. Second, since its equivalent system is frequently composed of large equations, the system is further simplified by another symbolic computation. The performance of the authors' method for parameter accuracy improvement is illustrated by two representative models in biology, a simple cascade model and a negative feedback model in comparison with the previous numerical methods. Finally, the limits and extensions of the authors' method are discussed, in terms of the possible power of 'Bruno force' for the development of a new horizon in parameter estimation.
NASA Astrophysics Data System (ADS)
Demenev, A. G.
2018-02-01
The present work is devoted to analyze high-performance computing (HPC) infrastructure capabilities for aircraft engine aeroacoustics problems solving at Perm State University. We explore here the ability to develop new computational aeroacoustics methods/solvers for computer-aided engineering (CAE) systems to handle complicated industrial problems of engine noise prediction. Leading aircraft engine engineering company, including “UEC-Aviadvigatel” JSC (our industrial partners in Perm, Russia), require that methods/solvers to optimize geometry of aircraft engine for fan noise reduction. We analysed Perm State University HPC-hardware resources and software services to use efficiently. The performed results demonstrate that Perm State University HPC-infrastructure are mature enough to face out industrial-like problems of development CAE-system with HPC-method and CFD-solvers.
Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model.
Browne, Patience; Judson, Richard S; Casey, Warren M; Kleinstreuer, Nicole C; Thomas, Russell S
2015-07-21
The U.S. Environmental Protection Agency (EPA) is considering high-throughput and computational methods to evaluate the endocrine bioactivity of environmental chemicals. Here we describe a multistep, performance-based validation of new methods and demonstrate that these new tools are sufficiently robust to be used in the Endocrine Disruptor Screening Program (EDSP). Results from 18 estrogen receptor (ER) ToxCast high-throughput screening assays were integrated into a computational model that can discriminate bioactivity from assay-specific interference and cytotoxicity. Model scores range from 0 (no activity) to 1 (bioactivity of 17β-estradiol). ToxCast ER model performance was evaluated for reference chemicals, as well as results of EDSP Tier 1 screening assays in current practice. The ToxCast ER model accuracy was 86% to 93% when compared to reference chemicals and predicted results of EDSP Tier 1 guideline and other uterotrophic studies with 84% to 100% accuracy. The performance of high-throughput assays and ToxCast ER model predictions demonstrates that these methods correctly identify active and inactive reference chemicals, provide a measure of relative ER bioactivity, and rapidly identify chemicals with potential endocrine bioactivities for additional screening and testing. EPA is accepting ToxCast ER model data for 1812 chemicals as alternatives for EDSP Tier 1 ER binding, ER transactivation, and uterotrophic assays.
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Moving Sound Source Localization Based on Sequential Subspace Estimation in Actual Room Environments
NASA Astrophysics Data System (ADS)
Tsuji, Daisuke; Suyama, Kenji
This paper presents a novel method for moving sound source localization and its performance evaluation in actual room environments. The method is based on the MUSIC (MUltiple SIgnal Classification) which is one of the most high resolution localization methods. When using the MUSIC, a computation of eigenvectors of correlation matrix is required for the estimation. It needs often a high computational costs. Especially, in the situation of moving source, it becomes a crucial drawback because the estimation must be conducted at every the observation time. Moreover, since the correlation matrix varies its characteristics due to the spatial-temporal non-stationarity, the matrix have to be estimated using only a few observed samples. It makes the estimation accuracy degraded. In this paper, the PAST (Projection Approximation Subspace Tracking) is applied for sequentially estimating the eigenvectors spanning the subspace. In the PAST, the eigen-decomposition is not required, and therefore it is possible to reduce the computational costs. Several experimental results in the actual room environments are shown to present the superior performance of the proposed method.
High-Performance Computing Systems and Operations | Computational Science |
NREL Systems and Operations High-Performance Computing Systems and Operations NREL operates high-performance computing (HPC) systems dedicated to advancing energy efficiency and renewable energy technologies. Capabilities NREL's HPC capabilities include: High-Performance Computing Systems We operate
Algorithm for fast event parameters estimation on GEM acquired data
NASA Astrophysics Data System (ADS)
Linczuk, Paweł; Krawczyk, Rafał D.; Poźniak, Krzysztof T.; Kasprowicz, Grzegorz; Wojeński, Andrzej; Chernyshova, Maryna; Czarski, Tomasz
2016-09-01
We present study of a software-hardware environment for developing fast computation with high throughput and low latency methods, which can be used as back-end in High Energy Physics (HEP) and other High Performance Computing (HPC) systems, based on high amount of input from electronic sensor based front-end. There is a parallelization possibilities discussion and testing on Intel HPC solutions with consideration of applications with Gas Electron Multiplier (GEM) measurement systems presented in this paper.
NASA Astrophysics Data System (ADS)
Monteiller, Vadim; Beller, Stephen; Operto, Stephane; Virieux, Jean
2015-04-01
The current development of dense seismic arrays and high performance computing make feasible today application of full-waveform inversion (FWI) on teleseismic data for high-resolution lithospheric imaging. In teleseismic configuration, the source is often considered to first order as a planar wave that impinges the base of the lithospheric target located below the receiver array. Recently, injection methods coupling global propagation in 1D or axisymmetric earth model with regional 3D methods (Discontinuous Galerkin finite element methods, Spectral elements methods or finite differences) allow us to consider more realistic teleseismic phases. Those teleseismic phases can be propagated inside 3D regional model in order to exploit not only the forward-scattered waves propagating up to the receiver but also second-order arrivals that are back-scattered from the free-surface and the reflectors before their recordings on the surface. However, those computation are performed assuming simple global model. In this presentation, we review some key specifications that might be considered for mitigating the effect on FWI of heterogeneities situated outside the regional domain. We consider synthetic models and data computed using our recently developed hybrid method AxiSEM/SEM. The global simulation is done by AxiSEM code which allows us to consider axisymmetric anomalies. The 3D regional computation is performed by Spectral Element Method. We investigate the effect of external anomalies on the regional model obtained by FWI when one neglects them by considering only 1D global propagation. We also investigate the effect of the source time function and the focal mechanism on results of the FWI approach.
Joint the Center for Applied Scientific Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamblin, Todd; Bremer, Timo; Van Essen, Brian
The Center for Applied Scientific Computing serves as Livermore Lab’s window to the broader computer science, computational physics, applied mathematics, and data science research communities. In collaboration with academic, industrial, and other government laboratory partners, we conduct world-class scientific research and development on problems critical to national security. CASC applies the power of high-performance computing and the efficiency of modern computational methods to the realms of stockpile stewardship, cyber and energy security, and knowledge discovery for intelligence applications.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
NASA Astrophysics Data System (ADS)
Nakatsuji, Noriaki; Matsushima, Kyoji
2017-03-01
Full-parallax high-definition CGHs composed of more than billion pixels were so far created only by the polygon-based method because of its high performance. However, GPUs recently allow us to generate CGHs much faster by the point cloud. In this paper, we measure computation time of object fields for full-parallax high-definition CGHs, which are composed of 4 billion pixels and reconstruct the same scene, by using the point cloud with GPU and the polygon-based method with CPU. In addition, we compare the optical and simulated reconstructions between CGHs created by these techniques to verify the image quality.
NASA Technical Reports Server (NTRS)
Lawson, John W.; Daw, Murray S.; Squire, Thomas H.; Bauschlicher, Charles W.
2012-01-01
We are developing a multiscale framework in computational modeling for the ultra high temperature ceramics (UHTC) ZrB2 and HfB2. These materials are characterized by high melting point, good strength, and reasonable oxidation resistance. They are candidate materials for a number of applications in extreme environments including sharp leading edges of hypersonic aircraft. In particular, we used a combination of ab initio methods, atomistic simulations and continuum computations to obtain insights into fundamental properties of these materials. Ab initio methods were used to compute basic structural, mechanical and thermal properties. From these results, a database was constructed to fit a Tersoff style interatomic potential suitable for atomistic simulations. These potentials were used to evaluate the lattice thermal conductivity of single crystals and the thermal resistance of simple grain boundaries. Finite element method (FEM) computations using atomistic results as inputs were performed with meshes constructed on SEM images thereby modeling the realistic microstructure. These continuum computations showed the reduction in thermal conductivity due to the grain boundary network.
Graphics Flutter Analysis Methods, an interactive computing system at Lockheed-California Company
NASA Technical Reports Server (NTRS)
Radovcich, N. A.
1975-01-01
An interactive computer graphics system, Graphics Flutter Analysis Methods (GFAM), was developed to complement FAMAS, a matrix-oriented batch computing system, and other computer programs in performing complex numerical calculations using a fully integrated data management system. GFAM has many of the matrix operation capabilities found in FAMAS, but on a smaller scale, and is utilized when the analysis requires a high degree of interaction between the engineer and computer, and schedule constraints exclude the use of batch entry programs. Applications of GFAM to a variety of preliminary design, development design, and project modification programs suggest that interactive flutter analysis using matrix representations is a feasible and cost effective computing tool.
The Reduced Basis Method in Geosciences: Practical examples for numerical forward simulations
NASA Astrophysics Data System (ADS)
Degen, D.; Veroy, K.; Wellmann, F.
2017-12-01
Due to the highly heterogeneous character of the earth's subsurface, the complex coupling of thermal, hydrological, mechanical, and chemical processes, and the limited accessibility we have to face high-dimensional problems associated with high uncertainties in geosciences. Performing the obviously necessary uncertainty quantifications with a reasonable number of parameters is often not possible due to the high-dimensional character of the problem. Therefore, we are presenting the reduced basis (RB) method, being a model order reduction (MOR) technique, that constructs low-order approximations to, for instance, the finite element (FE) space. We use the RB method to address this computationally challenging simulations because this method significantly reduces the degrees of freedom. The RB method is decomposed into an offline and online stage, allowing to make the expensive pre-computations beforehand to get real-time results during field campaigns. Generally, the RB approach is most beneficial in the many-query and real-time context.We will illustrate the advantages of the RB method for the field of geosciences through two examples of numerical forward simulations.The first example is a geothermal conduction problem demonstrating the implementation of the RB method for a steady-state case. The second examples, a Darcy flow problem, shows the benefits for transient scenarios. In both cases, a quality evaluation of the approximations is given. Additionally, the runtimes for both the FE and the RB simulations are compared. We will emphasize the advantages of this method for repetitive simulations by showing the speed-up for the RB solution in contrast to the FE solution. Finally, we will demonstrate how the used implementation is usable in high-performance computing (HPC) infrastructures and evaluate its performance for such infrastructures. Hence, we will especially point out its scalability, yielding in an optimal usage on HPC infrastructures and normal working stations.
NASA Technical Reports Server (NTRS)
Ghaffari, Farhad
1999-01-01
Unstructured grid Euler computations, performed at supersonic cruise speed, are presented for a High Speed Civil Transport (HSCT) configuration, designated as the Technology Concept Airplane (TCA) within the High Speed Research (HSR) Program. The numerical results are obtained for the complete TCA cruise configuration which includes the wing, fuselage, empennage, diverters, and flow through nacelles at M (sub infinity) = 2.4 for a range of angles-of-attack and sideslip. Although all the present computations are performed for the complete TCA configuration, appropriate assumptions derived from the fundamental supersonic aerodynamic principles have been made to extract aerodynamic predictions to complement the experimental data obtained from a 1.675%-scaled truncated (aft fuselage/empennage components removed) TCA model. The validity of the computational results, derived from the latter assumptions, are thoroughly addressed and discussed in detail. The computed surface and off-surface flow characteristics are analyzed and the pressure coefficient contours on the wing lower surface are shown to correlate reasonably well with the available pressure sensitive paint results, particularly, for the complex flow structures around the nacelles. The predicted longitudinal and lateral/directional performance characteristics for the truncated TCA configuration are shown to correlate very well with the corresponding wind-tunnel data across the examined range of angles-of-attack and sideslip. The complementary computational results for the longitudinal and lateral/directional performance characteristics for the complete TCA configuration are also presented along with the aerodynamic effects due to empennage components. Results are also presented to assess the computational method performance, solution sensitivity to grid refinement, and solution convergence characteristics.
Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu
2017-05-01
This paper addresses joint optimization for segmentation and shape priors, including translation, to overcome inter-subject variability in the location of an organ. Because a simple extension of the previous exact optimization method is too computationally complex, we propose a fast approximation for optimization. The effectiveness of the proposed approximation is validated in the context of gallbladder segmentation from a non-contrast computed tomography (CT) volume. After spatial standardization and estimation of the posterior probability of the target organ, simultaneous optimization of the segmentation, shape, and location priors is performed using a branch-and-bound method. Fast approximation is achieved by combining sampling in the eigenshape space to reduce the number of shape priors and an efficient computational technique for evaluating the lower bound. Performance was evaluated using threefold cross-validation of 27 CT volumes. Optimization in terms of translation of the shape prior significantly improved segmentation performance. The proposed method achieved a result of 0.623 on the Jaccard index in gallbladder segmentation, which is comparable to that of state-of-the-art methods. The computational efficiency of the algorithm is confirmed to be good enough to allow execution on a personal computer. Joint optimization of the segmentation, shape, and location priors was proposed, and it proved to be effective in gallbladder segmentation with high computational efficiency.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Principal Component Geostatistical Approach for large-dimensional inverse problems
Kitanidis, P K; Lee, J
2014-01-01
The quasi-linear geostatistical approach is for weakly nonlinear underdetermined inverse problems, such as Hydraulic Tomography and Electrical Resistivity Tomography. It provides best estimates as well as measures for uncertainty quantification. However, for its textbook implementation, the approach involves iterations, to reach an optimum, and requires the determination of the Jacobian matrix, i.e., the derivative of the observation function with respect to the unknown. Although there are elegant methods for the determination of the Jacobian, the cost is high when the number of unknowns, m, and the number of observations, n, is high. It is also wasteful to compute the Jacobian for points away from the optimum. Irrespective of the issue of computing derivatives, the computational cost of implementing the method is generally of the order of m2n, though there are methods to reduce the computational cost. In this work, we present an implementation that utilizes a matrix free in terms of the Jacobian matrix Gauss-Newton method and improves the scalability of the geostatistical inverse problem. For each iteration, it is required to perform K runs of the forward problem, where K is not just much smaller than m but can be smaller that n. The computational and storage cost of implementation of the inverse procedure scales roughly linearly with m instead of m2 as in the textbook approach. For problems of very large m, this implementation constitutes a dramatic reduction in computational cost compared to the textbook approach. Results illustrate the validity of the approach and provide insight in the conditions under which this method perform best. PMID:25558113
Principal Component Geostatistical Approach for large-dimensional inverse problems.
Kitanidis, P K; Lee, J
2014-07-01
The quasi-linear geostatistical approach is for weakly nonlinear underdetermined inverse problems, such as Hydraulic Tomography and Electrical Resistivity Tomography. It provides best estimates as well as measures for uncertainty quantification. However, for its textbook implementation, the approach involves iterations, to reach an optimum, and requires the determination of the Jacobian matrix, i.e., the derivative of the observation function with respect to the unknown. Although there are elegant methods for the determination of the Jacobian, the cost is high when the number of unknowns, m , and the number of observations, n , is high. It is also wasteful to compute the Jacobian for points away from the optimum. Irrespective of the issue of computing derivatives, the computational cost of implementing the method is generally of the order of m 2 n , though there are methods to reduce the computational cost. In this work, we present an implementation that utilizes a matrix free in terms of the Jacobian matrix Gauss-Newton method and improves the scalability of the geostatistical inverse problem. For each iteration, it is required to perform K runs of the forward problem, where K is not just much smaller than m but can be smaller that n . The computational and storage cost of implementation of the inverse procedure scales roughly linearly with m instead of m 2 as in the textbook approach. For problems of very large m , this implementation constitutes a dramatic reduction in computational cost compared to the textbook approach. Results illustrate the validity of the approach and provide insight in the conditions under which this method perform best.
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
AstroCV: Astronomy computer vision library
NASA Astrophysics Data System (ADS)
González, Roberto E.; Muñoz, Roberto P.; Hernández, Cristian A.
2018-04-01
AstroCV processes and analyzes big astronomical datasets, and is intended to provide a community repository of high performance Python and C++ algorithms used for image processing and computer vision. The library offers methods for object recognition, segmentation and classification, with emphasis in the automatic detection and classification of galaxies.
An adaptive angle-doppler compensation method for airborne bistatic radar based on PAST
NASA Astrophysics Data System (ADS)
Hang, Xu; Jun, Zhao
2018-05-01
Adaptive angle-Doppler compensation method extract the requisite information based on the data itself adaptively, thus avoiding the problem of performance degradation caused by inertia system error. However, this method requires estimation and egiendecomposition of sample covariance matrix, which has a high computational complexity and limits its real-time application. In this paper, an adaptive angle Doppler compensation method based on projection approximation subspace tracking (PAST) is studied. The method uses cyclic iterative processing to quickly estimate the positions of the spectral center of the maximum eigenvector of each range cell, and the computational burden of matrix estimation and eigen-decompositon is avoided, and then the spectral centers of all range cells is overlapped by two dimensional compensation. Simulation results show the proposed method can effectively reduce the no homogeneity of airborne bistatic radar, and its performance is similar to that of egien-decomposition algorithms, but the computation load is obviously reduced and easy to be realized.
On some Aitken-like acceleration of the Schwarz method
NASA Astrophysics Data System (ADS)
Garbey, M.; Tromeur-Dervout, D.
2002-12-01
In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
A Fluid Structure Algorithm with Lagrange Multipliers to Model Free Swimming
NASA Astrophysics Data System (ADS)
Sahin, Mehmet; Dilek, Ezgi
2017-11-01
A new monolithic approach is prosed to solve the fluid-structure interaction (FSI) problem with Lagrange multipliers in order to model free swimming/flying. In the present approach, the fluid domain is modeled by the incompressible Navier-Stokes equations and discretized using an Arbitrary Lagrangian-Eulerian (ALE) formulation based on the stable side-centered unstructured finite volume method. The solid domain is modeled by the constitutive laws for the nonlinear Saint Venant-Kirchhoff material and the classical Galerkin finite element method is used to discretize the governing equations in a Lagrangian frame. In order to impose the body motion/deformation, the distance between the constraint pair nodes is imposed using the Lagrange multipliers, which is independent from the frame of reference. The resulting algebraic linear equations are solved in a fully coupled manner using a dual approach (null space method). The present numerical algorithm is initially validated for the classical FSI benchmark problems and then applied to the free swimming of three linked ellipses. The authors are grateful for the use of the computing resources provided by the National Center for High Performance Computing (UYBHM) under Grant Number 10752009 and the computing facilities at TUBITAK-ULAKBIM, High Performance and Grid Computing Center.
Wolinski, Christophe Czeslaw [Los Alamos, NM; Gokhale, Maya B [Los Alamos, NM; McCabe, Kevin Peter [Los Alamos, NM
2011-01-18
Fabric-based computing systems and methods are disclosed. A fabric-based computing system can include a polymorphous computing fabric that can be customized on a per application basis and a host processor in communication with said polymorphous computing fabric. The polymorphous computing fabric includes a cellular architecture that can be highly parameterized to enable a customized synthesis of fabric instances for a variety of enhanced application performances thereof. A global memory concept can also be included that provides the host processor random access to all variables and instructions associated with the polymorphous computing fabric.
Aeroelasticity of wing and wing-body configurations on parallel computers
NASA Technical Reports Server (NTRS)
Byun, Chansup
1995-01-01
The objective of this research is to develop computationally efficient methods for solving aeroelasticity problems on parallel computers. Both uncoupled and coupled methods are studied in this research. For the uncoupled approach, the conventional U-g method is used to determine the flutter boundary. The generalized aerodynamic forces required are obtained by the pulse transfer-function analysis method. For the coupled approach, the fluid-structure interaction is obtained by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
High-Performance Computing User Facility | Computational Science | NREL
User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
NASA Astrophysics Data System (ADS)
Evans, B. J. K.; Foster, C.; Minchin, S. A.; Pugh, T.; Lewis, A.; Wyborn, L. A.; Evans, B. J.; Uhlherr, A.
2014-12-01
The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections. NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.
Multiscale Modeling of UHTC: Thermal Conductivity
NASA Technical Reports Server (NTRS)
Lawson, John W.; Murry, Daw; Squire, Thomas; Bauschlicher, Charles W.
2012-01-01
We are developing a multiscale framework in computational modeling for the ultra high temperature ceramics (UHTC) ZrB2 and HfB2. These materials are characterized by high melting point, good strength, and reasonable oxidation resistance. They are candidate materials for a number of applications in extreme environments including sharp leading edges of hypersonic aircraft. In particular, we used a combination of ab initio methods, atomistic simulations and continuum computations to obtain insights into fundamental properties of these materials. Ab initio methods were used to compute basic structural, mechanical and thermal properties. From these results, a database was constructed to fit a Tersoff style interatomic potential suitable for atomistic simulations. These potentials were used to evaluate the lattice thermal conductivity of single crystals and the thermal resistance of simple grain boundaries. Finite element method (FEM) computations using atomistic results as inputs were performed with meshes constructed on SEM images thereby modeling the realistic microstructure. These continuum computations showed the reduction in thermal conductivity due to the grain boundary network.
Coupled Aerodynamic and Structural Sensitivity Analysis of a High-Speed Civil Transport
NASA Technical Reports Server (NTRS)
Mason, B. H.; Walsh, J. L.
2001-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite-element structural analysis and computational fluid dynamics aerodynamic analysis. In a previous study, a multi-disciplinary analysis system for a high-speed civil transport was formulated to integrate a set of existing discipline analysis codes, some of them computationally intensive, This paper is an extension of the previous study, in which the sensitivity analysis for the coupled aerodynamic and structural analysis problem is formulated and implemented. Uncoupled stress sensitivities computed with a constant load vector in a commercial finite element analysis code are compared to coupled aeroelastic sensitivities computed by finite differences. The computational expense of these sensitivity calculation methods is discussed.
Polymer waveguides for electro-optical integration in data centers and high-performance computers.
Dangel, Roger; Hofrichter, Jens; Horst, Folkert; Jubin, Daniel; La Porta, Antonio; Meier, Norbert; Soganci, Ibrahim Murat; Weiss, Jonas; Offrein, Bert Jan
2015-02-23
To satisfy the intra- and inter-system bandwidth requirements of future data centers and high-performance computers, low-cost low-power high-throughput optical interconnects will become a key enabling technology. To tightly integrate optics with the computing hardware, particularly in the context of CMOS-compatible silicon photonics, optical printed circuit boards using polymer waveguides are considered as a formidable platform. IBM Research has already demonstrated the essential silicon photonics and interconnection building blocks. A remaining challenge is electro-optical packaging, i.e., the connection of the silicon photonics chips with the system. In this paper, we present a new single-mode polymer waveguide technology and a scalable method for building the optical interface between silicon photonics chips and single-mode polymer waveguides.
High Performance Computing Meets Energy Efficiency - Continuum Magazine |
NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data
Computational methods in metabolic engineering for strain design.
Long, Matthew R; Ong, Wai Kit; Reed, Jennifer L
2015-08-01
Metabolic engineering uses genetic approaches to control microbial metabolism to produce desired compounds. Computational tools can identify new biological routes to chemicals and the changes needed in host metabolism to improve chemical production. Recent computational efforts have focused on exploring what compounds can be made biologically using native, heterologous, and/or enzymes with broad specificity. Additionally, computational methods have been developed to suggest different types of genetic modifications (e.g. gene deletion/addition or up/down regulation), as well as suggest strategies meeting different criteria (e.g. high yield, high productivity, or substrate co-utilization). Strategies to improve the runtime performances have also been developed, which allow for more complex metabolic engineering strategies to be identified. Future incorporation of kinetic considerations will further improve strain design algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG
NASA Astrophysics Data System (ADS)
Rohe, Daniel
2016-10-01
The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.
Parallel, adaptive finite element methods for conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.
1994-01-01
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Integrating computational methods to retrofit enzymes to synthetic pathways.
Brunk, Elizabeth; Neri, Marilisa; Tavernelli, Ivano; Hatzimanikatis, Vassily; Rothlisberger, Ursula
2012-02-01
Microbial production of desired compounds provides an efficient framework for the development of renewable energy resources. To be competitive to traditional chemistry, one requirement is to utilize the full capacity of the microorganism to produce target compounds with high yields and turnover rates. We use integrated computational methods to generate and quantify the performance of novel biosynthetic routes that contain highly optimized catalysts. Engineering a novel reaction pathway entails addressing feasibility on multiple levels, which involves handling the complexity of large-scale biochemical networks while respecting the critical chemical phenomena at the atomistic scale. To pursue this multi-layer challenge, our strategy merges knowledge-based metabolic engineering methods with computational chemistry methods. By bridging multiple disciplines, we provide an integral computational framework that could accelerate the discovery and implementation of novel biosynthetic production routes. Using this approach, we have identified and optimized a novel biosynthetic route for the production of 3HP from pyruvate. Copyright © 2011 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Araki, Samuel J.
2016-11-01
In the plumes of Hall thrusters and ion thrusters, high energy ions experience elastic collisions with slow neutral atoms. These collisions involve a process of momentum exchange, altering the initial velocity vectors of the collision pair. In addition to the momentum exchange process, ions and atoms can exchange electrons, resulting in slow charge-exchange ions and fast atoms. In these simulations, it is particularly important to accurately perform computations of ion-atom elastic collisions in determining the plume current profile and assessing the integration of spacecraft components. The existing models are currently capable of accurate calculation but are not fast enough such that the calculation can be a bottleneck of plume simulations. This study investigates methods to accelerate an ion-atom elastic collision calculation that includes both momentum- and charge-exchange processes. The scattering angles are pre-computed through a classical approach with ab initio spin-orbit free potential and are stored in a two-dimensional array as functions of impact parameter and energy. When performing a collision calculation for an ion-atom pair, the scattering angle is computed by a table lookup and multiple linear interpolations, given the relative energy and randomly determined impact parameter. In order to further accelerate the calculations, the number of collision calculations is reduced by properly defining two cut-off cross-sections for the elastic scattering. In the MCC method, the target atom needs to be sampled; however, it is confirmed that initial target atom velocity does not play a significant role in typical electric propulsion plume simulations such that the sampling process is unnecessary. With these implementations, the computational run-time to perform a collision calculation is reduced significantly compared to previous methods, while retaining the accuracy of the high fidelity models.
Variable-Complexity Multidisciplinary Optimization on Parallel Computers
NASA Technical Reports Server (NTRS)
Grossman, Bernard; Mason, William H.; Watson, Layne T.; Haftka, Raphael T.
1998-01-01
This report covers work conducted under grant NAG1-1562 for the NASA High Performance Computing and Communications Program (HPCCP) from December 7, 1993, to December 31, 1997. The objective of the research was to develop new multidisciplinary design optimization (MDO) techniques which exploit parallel computing to reduce the computational burden of aircraft MDO. The design of the High-Speed Civil Transport (HSCT) air-craft was selected as a test case to demonstrate the utility of our MDO methods. The three major tasks of this research grant included: development of parallel multipoint approximation methods for the aerodynamic design of the HSCT, use of parallel multipoint approximation methods for structural optimization of the HSCT, mathematical and algorithmic development including support in the integration of parallel computation for items (1) and (2). These tasks have been accomplished with the development of a response surface methodology that incorporates multi-fidelity models. For the aerodynamic design we were able to optimize with up to 20 design variables using hundreds of expensive Euler analyses together with thousands of inexpensive linear theory simulations. We have thereby demonstrated the application of CFD to a large aerodynamic design problem. For the predicting structural weight we were able to combine hundreds of structural optimizations of refined finite element models with thousands of optimizations based on coarse models. Computations have been carried out on the Intel Paragon with up to 128 nodes. The parallel computation allowed us to perform combined aerodynamic-structural optimization using state of the art models of a complex aircraft configurations.
NASA Astrophysics Data System (ADS)
Dinges, David F.; Venkataraman, Sundara; McGlinchey, Eleanor L.; Metaxas, Dimitris N.
2007-02-01
Astronauts are required to perform mission-critical tasks at a high level of functional capability throughout spaceflight. Stressors can compromise their ability to do so, making early objective detection of neurobehavioral problems in spaceflight a priority. Computer optical approaches offer a completely unobtrusive way to detect distress during critical operations in space flight. A methodology was developed and a study completed to determine whether optical computer recognition algorithms could be used to discriminate facial expressions during stress induced by performance demands. Stress recognition from a facial image sequence is a subject that has not received much attention although it is an important problem for many applications beyond space flight (security, human-computer interaction, etc.). This paper proposes a comprehensive method to detect stress from facial image sequences by using a model-based tracker. The image sequences were captured as subjects underwent a battery of psychological tests under high- and low-stress conditions. A cue integration-based tracking system accurately captured the rigid and non-rigid parameters of different parts of the face (eyebrows, lips). The labeled sequences were used to train the recognition system, which consisted of generative (hidden Markov model) and discriminative (support vector machine) parts that yield results superior to using either approach individually. The current optical algorithm methods performed at a 68% accuracy rate in an experimental study of 60 healthy adults undergoing periods of high-stress versus low-stress performance demands. Accuracy and practical feasibility of the technique is being improved further with automatic multi-resolution selection for the discretization of the mask, and automated face detection and mask initialization algorithms.
Development and numerical analysis of low specific speed mixed-flow pump
NASA Astrophysics Data System (ADS)
Li, H. F.; Huo, Y. W.; Pan, Z. B.; Zhou, W. C.; He, M. H.
2012-11-01
With the development of the city, the market of the mixed flow pump with large flux and high head is prospect. The KSB Shanghai Pump Co., LTD decided to develop low speed specific speed mixed flow pump to meet the market requirements. Based on the centrifugal pump and axial flow pump model, aiming at the characteristics of large flux and high head, a new type of guide vane mixed flow pump was designed. The computational fluid dynamics method was adopted to analyze the internal flow of the new type model and predict its performances. The time-averaged Navier-Stokes equations were closed by SST k-ω turbulent model to adapt internal flow of guide vane with larger curvatures. The multi-reference frame(MRF) method was used to deal with the coupling of rotating impeller and static guide vane, and the SIMPLEC method was adopted to achieve the coupling solution of velocity and pressure. The computational results shows that there is great flow impact on the head of vanes at different working conditions, and there is great flow separation at the tailing of the guide vanes at different working conditions, and all will affect the performance of pump. Based on the computational results, optimizations were carried out to decrease the impact on the head of vanes and flow separation at the tailing of the guide vanes. The optimized model was simulated and its performance was predicted. The computational results show that the impact on the head of vanes and the separation at the tailing of the guide vanes disappeared. The high efficiency of the optimized pump is wide, and it fit the original design destination. The newly designed mixed flow pump is now in modeling and its experimental performance will be getting soon.
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk
We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
NASA Technical Reports Server (NTRS)
Demerdash, N. A.; Wang, R.; Secunde, R.
1992-01-01
A 3D finite element (FE) approach was developed and implemented for computation of global magnetic fields in a 14.3 kVA modified Lundell alternator. The essence of the new method is the combined use of magnetic vector and scalar potential formulations in 3D FEs. This approach makes it practical, using state of the art supercomputer resources, to globally analyze magnetic fields and operating performances of rotating machines which have truly 3D magnetic flux patterns. The 3D FE-computed fields and machine inductances as well as various machine performance simulations of the 14.3 kVA machine are presented in this paper and its two companion papers.
Sensitivity Analysis for Coupled Aero-structural Systems
NASA Technical Reports Server (NTRS)
Giunta, Anthony A.
1999-01-01
A novel method has been developed for calculating gradients of aerodynamic force and moment coefficients for an aeroelastic aircraft model. This method uses the Global Sensitivity Equations (GSE) to account for the aero-structural coupling, and a reduced-order modal analysis approach to condense the coupling bandwidth between the aerodynamic and structural models. Parallel computing is applied to reduce the computational expense of the numerous high fidelity aerodynamic analyses needed for the coupled aero-structural system. Good agreement is obtained between aerodynamic force and moment gradients computed with the GSE/modal analysis approach and the same quantities computed using brute-force, computationally expensive, finite difference approximations. A comparison between the computational expense of the GSE/modal analysis method and a pure finite difference approach is presented. These results show that the GSE/modal analysis approach is the more computationally efficient technique if sensitivity analysis is to be performed for two or more aircraft design parameters.
NASA Astrophysics Data System (ADS)
Fei, Cheng-Wei; Bai, Guang-Chen
2014-12-01
To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.
Connecting to HPC Systems | High-Performance Computing | NREL
one of the following methods, which use multi-factor authentication. First, you will need to set up If you just need access to a command line on an HPC system, use one of the following methods
2015-07-01
performance computing time from the US Department of Defense (DOD) High Performance Computing Modernization program at the US Army Research Laboratory...Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time ...dimensional, compressible, Reynolds-averaged Navier-Stokes (RANS) equations are solved using a finite volume method. A point-implicit time - integration
A frozen Gaussian approximation-based multi-level particle swarm optimization for seismic inversion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jinglai, E-mail: jinglaili@sjtu.edu.cn; Lin, Guang, E-mail: lin491@purdue.edu; Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352
2015-09-01
In this paper, we propose a frozen Gaussian approximation (FGA)-based multi-level particle swarm optimization (MLPSO) method for seismic inversion of high-frequency wave data. The method addresses two challenges in it: First, the optimization problem is highly non-convex, which makes hard for gradient-based methods to reach global minima. This is tackled by MLPSO which can escape from undesired local minima. Second, the character of high-frequency of seismic waves requires a large number of grid points in direct computational methods, and thus renders an extremely high computational demand on the simulation of each sample in MLPSO. We overcome this difficulty by threemore » steps: First, we use FGA to compute high-frequency wave propagation based on asymptotic analysis on phase plane; Then we design a constrained full waveform inversion problem to prevent the optimization search getting into regions of velocity where FGA is not accurate; Last, we solve the constrained optimization problem by MLPSO that employs FGA solvers with different fidelity. The performance of the proposed method is demonstrated by a two-dimensional full-waveform inversion example of the smoothed Marmousi model.« less
Silicon photonics for high-performance interconnection networks
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr
2011-12-01
We assert in the course of this work that silicon photonics has the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems, and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. This work showcases that chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, enable unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of this work, we demonstrate such feasibility of waveguides, modulators, switches, and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. Furthermore, we leverage the unique properties of available silicon photonic materials to create novel silicon photonic devices, subsystems, network topologies, and architectures to enable unprecedented performance of these photonic interconnection networks and computing systems. We show that the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. Furthermore, we explore the immense potential of all-optical functionalities implemented using parametric processing in the silicon platform, demonstrating unique methods that have the ability to revolutionize computation and communication. Silicon photonics enables new sets of opportunities that we can leverage for performance gains, as well as new sets of challenges that we must solve. Leveraging its inherent compatibility with standard fabrication techniques of the semiconductor industry, combined with its capability of dense integration with advanced microelectronics, silicon photonics also offers a clear path toward commercialization through low-cost mass-volume production. Combining empirical validations of feasibility, demonstrations of massive performance gains in large-scale systems, and the potential for commercial penetration of silicon photonics, the impact of this work will become evident in the many decades that follow.
An interactive program for pharmacokinetic modeling.
Lu, D R; Mao, F
1993-05-01
A computer program, PharmK, was developed for pharmacokinetic modeling of experimental data. The program was written in C computer language based on the high-level user-interface Macintosh operating system. The intention was to provide a user-friendly tool for users of Macintosh computers. An interactive algorithm based on the exponential stripping method is used for the initial parameter estimation. Nonlinear pharmacokinetic model fitting is based on the maximum likelihood estimation method and is performed by the Levenberg-Marquardt method based on chi 2 criterion. Several methods are available to aid the evaluation of the fitting results. Pharmacokinetic data sets have been examined with the PharmK program, and the results are comparable with those obtained with other programs that are currently available for IBM PC-compatible and other types of computers.
Brun, E; Grandl, S; Sztrókay-Gaul, A; Barbone, G; Mittone, A; Gasilov, S; Bravin, A; Coan, P
2014-11-01
Phase contrast computed tomography has emerged as an imaging method, which is able to outperform present day clinical mammography in breast tumor visualization while maintaining an equivalent average dose. To this day, no segmentation technique takes into account the specificity of the phase contrast signal. In this study, the authors propose a new mathematical framework for human-guided breast tumor segmentation. This method has been applied to high-resolution images of excised human organs, each of several gigabytes. The authors present a segmentation procedure based on the viscous watershed transform and demonstrate the efficacy of this method on analyzer based phase contrast images. The segmentation of tumors inside two full human breasts is then shown as an example of this procedure's possible applications. A correct and precise identification of the tumor boundaries was obtained and confirmed by manual contouring performed independently by four experienced radiologists. The authors demonstrate that applying the watershed viscous transform allows them to perform the segmentation of tumors in high-resolution x-ray analyzer based phase contrast breast computed tomography images. Combining the additional information provided by the segmentation procedure with the already high definition of morphological details and tissue boundaries offered by phase contrast imaging techniques, will represent a valuable multistep procedure to be used in future medical diagnostic applications.
Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations
NASA Astrophysics Data System (ADS)
Detrixhe, Miles; Gibou, Frédéric
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
A Biosequence-based Approach to Software Characterization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oehmen, Christopher S.; Peterson, Elena S.; Phillips, Aaron R.
For many applications, it is desirable to have some process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. Some examples include monitoring utilization of high performance computing centers or service clouds, detecting freeware in licensed code, and enforcing application whitelists. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but nonidentical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries.more » Using this method, it is shown in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal; and 2) an example of using family tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.« less
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.
Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart
2011-06-01
The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.
High performance computing and communications program
NASA Technical Reports Server (NTRS)
Holcomb, Lee
1992-01-01
A review of the High Performance Computing and Communications (HPCC) program is provided in vugraph format. The goals and objectives of this federal program are as follows: extend U.S. leadership in high performance computing and computer communications; disseminate the technologies to speed innovation and to serve national goals; and spur gains in industrial competitiveness by making high performance computing integral to design and production.
NASA Technical Reports Server (NTRS)
Carlson, Harry W.; Darden, Christine M.; Mann, Michael J.
1990-01-01
Extensive correlations of computer code results with experimental data are employed to illustrate the use of a linearized theory, attached flow method for the estimation and optimization of the longitudinal aerodynamic performance of wing-canard and wing-horizontal tail configurations which may employ simple hinged flap systems. Use of an attached flow method is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. The results indicate that linearized theory, attached flow, computer code methods (modified to include estimated attainable leading-edge thrust and an approximate representation of vortex forces) provide a rational basis for the estimation and optimization of aerodynamic performance at subsonic speeds below the drag rise Mach number. Generally, good prediction of aerodynamic performance, as measured by the suction parameter, can be expected for near optimum combinations of canard or horizontal tail incidence and leading- and trailing-edge flap deflections at a given lift coefficient (conditions which tend to produce a predominantly attached flow).
Experiences with Probabilistic Analysis Applied to Controlled Systems
NASA Technical Reports Server (NTRS)
Kenny, Sean P.; Giesy, Daniel P.
2004-01-01
This paper presents a semi-analytic method for computing frequency dependent means, variances, and failure probabilities for arbitrarily large-order closed-loop dynamical systems possessing a single uncertain parameter or with multiple highly correlated uncertain parameters. The approach will be shown to not suffer from the same computational challenges associated with computing failure probabilities using conventional FORM/SORM techniques. The approach is demonstrated by computing the probabilistic frequency domain performance of an optimal feed-forward disturbance rejection scheme.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
NASA Astrophysics Data System (ADS)
Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.
2017-06-01
Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Computational Science News | Computational Science | NREL
-Cooled High-Performance Computing Technology at the ESIF February 28, 2018 NREL Launches New Website for High-Performance Computing System Users The National Renewable Energy Laboratory (NREL) Computational Science Center has launched a revamped website for users of the lab's high-performance computing (HPC
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Re-Tooling the Agency's Engineering Predictive Practices for Durability and Damage Tolerance
NASA Technical Reports Server (NTRS)
Piascik, Robert S.; Knight, Norman F., Jr.
2017-01-01
Over the past decade, the Agency has placed less emphasis on testing and has increasingly relied on computational methods to assess durability and damage tolerance (D&DT) behavior when evaluating design margins for fracture-critical components. With increased emphasis on computational D&DT methods as the standard practice, it is paramount that capabilities of these methods are understood, the methods are used within their technical limits, and validation by well-designed tests confirms understanding. The D&DT performance of a component is highly dependent on parameters in the neighborhood of the damage. This report discusses D&DT method vulnerabilities.
Systematic exploration of unsupervised methods for mapping behavior
NASA Astrophysics Data System (ADS)
Todd, Jeremy G.; Kain, Jamey S.; de Bivort, Benjamin L.
2017-02-01
To fully understand the mechanisms giving rise to behavior, we need to be able to precisely measure it. When coupled with large behavioral data sets, unsupervised clustering methods offer the potential of unbiased mapping of behavioral spaces. However, unsupervised techniques to map behavioral spaces are in their infancy, and there have been few systematic considerations of all the methodological options. We compared the performance of seven distinct mapping methods in clustering a wavelet-transformed data set consisting of the x- and y-positions of the six legs of individual flies. Legs were automatically tracked by small pieces of fluorescent dye, while the fly was tethered and walking on an air-suspended ball. We find that there is considerable variation in the performance of these mapping methods, and that better performance is attained when clustering is done in higher dimensional spaces (which are otherwise less preferable because they are hard to visualize). High dimensionality means that some algorithms, including the non-parametric watershed cluster assignment algorithm, cannot be used. We developed an alternative watershed algorithm which can be used in high-dimensional spaces when a probability density estimate can be computed directly. With these tools in hand, we examined the behavioral space of fly leg postural dynamics and locomotion. We find a striking division of behavior into modes involving the fore legs and modes involving the hind legs, with few direct transitions between them. By computing behavioral clusters using the data from all flies simultaneously, we show that this division appears to be common to all flies. We also identify individual-to-individual differences in behavior and behavioral transitions. Lastly, we suggest a computational pipeline that can achieve satisfactory levels of performance without the taxing computational demands of a systematic combinatorial approach.
Uncertainties in obtaining high reliability from stress-strength models
NASA Technical Reports Server (NTRS)
Neal, Donald M.; Matthews, William T.; Vangel, Mark G.
1992-01-01
There has been a recent interest in determining high statistical reliability in risk assessment of aircraft components. The potential consequences are identified of incorrectly assuming a particular statistical distribution for stress or strength data used in obtaining the high reliability values. The computation of the reliability is defined as the probability of the strength being greater than the stress over the range of stress values. This method is often referred to as the stress-strength model. A sensitivity analysis was performed involving a comparison of reliability results in order to evaluate the effects of assuming specific statistical distributions. Both known population distributions, and those that differed slightly from the known, were considered. Results showed substantial differences in reliability estimates even for almost nondetectable differences in the assumed distributions. These differences represent a potential problem in using the stress-strength model for high reliability computations, since in practice it is impossible to ever know the exact (population) distribution. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability.
Desktop supercomputer: what can it do?
NASA Astrophysics Data System (ADS)
Bogdanov, A.; Degtyarev, A.; Korkhov, V.
2017-12-01
The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Gui, Zhipeng; Yu, Manzhu; Yang, Chaowei; Jiang, Yunfeng; Chen, Songqing; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Hassan, Mohammed Anowarul; Jin, Baoxuan
2016-01-01
Dust storm has serious disastrous impacts on environment, human health, and assets. The developments and applications of dust storm models have contributed significantly to better understand and predict the distribution, intensity and structure of dust storms. However, dust storm simulation is a data and computing intensive process. To improve the computing performance, high performance computing has been widely adopted by dividing the entire study area into multiple subdomains and allocating each subdomain on different computing nodes in a parallel fashion. Inappropriate allocation may introduce imbalanced task loads and unnecessary communications among computing nodes. Therefore, allocation is a key factor that may impact the efficiency of parallel process. An allocation algorithm is expected to consider the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire simulation. This research introduces three algorithms to optimize the allocation by considering the spatial and communicational constraints: 1) an Integer Linear Programming (ILP) based algorithm from combinational optimization perspective; 2) a K-Means and Kernighan-Lin combined heuristic algorithm (K&K) integrating geometric and coordinate-free methods by merging local and global partitioning; 3) an automatic seeded region growing based geometric and local partitioning algorithm (ASRG). The performance and effectiveness of the three algorithms are compared based on different factors. Further, we adopt the K&K algorithm as the demonstrated algorithm for the experiment of dust model simulation with the non-hydrostatic mesoscale model (NMM-dust) and compared the performance with the MPI default sequential allocation. The results demonstrate that K&K method significantly improves the simulation performance with better subdomain allocation. This method can also be adopted for other relevant atmospheric and numerical modeling. PMID:27044039
AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams.
Chen, Qiuwen; Luley, Ryan; Wu, Qing; Bishop, Morgan; Linderman, Richard W; Qiu, Qinru
2018-05-01
The evolution of high performance computing technologies has enabled the large-scale implementation of neuromorphic models and pushed the research in computational intelligence into a new era. Among the machine learning applications, unsupervised detection of anomalous streams is especially challenging due to the requirements of detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research topic. In this paper, we propose anomaly recognition and detection (AnRAD), a bioinspired detection framework that performs probabilistic inferences. We analyze the feature dependency and develop a self-structuring method that learns an efficient confabulation network using unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base using streaming data. Compared with several existing anomaly detection approaches, our method provides competitive detection quality. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementations of the detection algorithm on the graphic processing unit and the Xeon Phi coprocessor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor. The framework provides real-time service to concurrent data streams within diversified knowledge contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle behavior detection, the framework is able to monitor up to 16000 vehicles (data streams) and their interactions in real time with a single commodity coprocessor, and uses less than 0.2 ms for one testing subject. Finally, the detection network is ported to our spiking neural network simulator to show the potential of adapting to the emerging neuromorphic architectures.
NASA Astrophysics Data System (ADS)
Bonnet, M.; Collino, F.; Demaldent, E.; Imperiale, A.; Pesudo, L.
2018-05-01
Ultrasonic Non-Destructive Testing (US NDT) has become widely used in various fields of applications to probe media. Exploiting the surface measurements of the ultrasonic incident waves echoes after their propagation through the medium, it allows to detect potential defects (cracks and inhomogeneities) and characterize the medium. The understanding and interpretation of those experimental measurements is performed with the help of numerical modeling and simulations. However, classical numerical methods can become computationally very expensive for the simulation of wave propagation in the high frequency regime. On the other hand, asymptotic techniques are better suited to model high frequency scattering over large distances but nevertheless do not allow accurate simulation of complex diffraction phenomena. Thus, neither numerical nor asymptotic methods can individually solve high frequency diffraction problems in large media, as those involved in UNDT controls, both quickly and accurately, but their advantages and limitations are complementary. Here we propose a hybrid strategy coupling the surface integral equation method and the ray tracing method to simulate high frequency diffraction under speed and accuracy constraints. This strategy is general and applicable to simulate diffraction phenomena in acoustic or elastodynamic media. We provide its implementation and investigate its performances for the 2D acoustic diffraction problem. The main features of this hybrid method are described and results of 2D computational experiments discussed.
Multi-GPU hybrid programming accelerated three-dimensional phase-field model in binary alloy
NASA Astrophysics Data System (ADS)
Zhu, Changsheng; Liu, Jieqiong; Zhu, Mingfang; Feng, Li
2018-03-01
In the process of dendritic growth simulation, the computational efficiency and the problem scales have extremely important influence on simulation efficiency of three-dimensional phase-field model. Thus, seeking for high performance calculation method to improve the computational efficiency and to expand the problem scales has a great significance to the research of microstructure of the material. A high performance calculation method based on MPI+CUDA hybrid programming model is introduced. Multi-GPU is used to implement quantitative numerical simulations of three-dimensional phase-field model in binary alloy under the condition of multi-physical processes coupling. The acceleration effect of different GPU nodes on different calculation scales is explored. On the foundation of multi-GPU calculation model that has been introduced, two optimization schemes, Non-blocking communication optimization and overlap of MPI and GPU computing optimization, are proposed. The results of two optimization schemes and basic multi-GPU model are compared. The calculation results show that the use of multi-GPU calculation model can improve the computational efficiency of three-dimensional phase-field obviously, which is 13 times to single GPU, and the problem scales have been expanded to 8193. The feasibility of two optimization schemes is shown, and the overlap of MPI and GPU computing optimization has better performance, which is 1.7 times to basic multi-GPU model, when 21 GPUs are used.
Improving Data Transfer Throughput with Direct Search Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaprakash, Prasanna; Morozov, Vitali; Kettimuthu, Rajkumar
2016-01-01
Improving data transfer throughput over high-speed long-distance networks has become increasingly difficult. Numerous factors such as nondeterministic congestion, dynamics of the transfer protocol, and multiuser and multitask source and destination endpoints, as well as interactions among these factors, contribute to this difficulty. A promising approach to improving throughput consists in using parallel streams at the application layer.We formulate and solve the problem of choosing the number of such streams from a mathematical optimization perspective. We propose the use of direct search methods, a class of easy-to-implement and light-weight mathematical optimization algorithms, to improve the performance of data transfers by dynamicallymore » adapting the number of parallel streams in a manner that does not require domain expertise, instrumentation, analytical models, or historic data. We apply our method to transfers performed with the GridFTP protocol, and illustrate the effectiveness of the proposed algorithm when used within Globus, a state-of-the-art data transfer tool, on productionWAN links and servers. We show that when compared to user default settings our direct search methods can achieve up to 10x performance improvement under certain conditions. We also show that our method can overcome performance degradation due to external compute and network load on source end points, a common scenario at high performance computing facilities.« less
Information processing using a single dynamical node as complex system
Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.
2011-01-01
Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110
NASA Astrophysics Data System (ADS)
O'Malley, D.; Le, E. B.; Vesselinov, V. V.
2015-12-01
We present a fast, scalable, and highly-implementable stochastic inverse method for characterization of aquifer heterogeneity. The method utilizes recent advances in randomized matrix algebra and exploits the structure of the Quasi-Linear Geostatistical Approach (QLGA), without requiring a structured grid like Fast-Fourier Transform (FFT) methods. The QLGA framework is a more stable version of Gauss-Newton iterates for a large number of unknown model parameters, but provides unbiased estimates. The methods are matrix-free and do not require derivatives or adjoints, and are thus ideal for complex models and black-box implementation. We also incorporate randomized least-square solvers and data-reduction methods, which speed up computation and simulate missing data points. The new inverse methodology is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. Inversion results based on series of synthetic problems with steady-state and transient calibration data are presented.
NASA Astrophysics Data System (ADS)
Sarangapani, R.; Jose, M. T.; Srinivasan, T. K.; Venkatraman, B.
2017-07-01
Methods for the determination of efficiency of an aged high purity germanium (HPGe) detector for gaseous sources have been presented in the paper. X-ray radiography of the detector has been performed to get detector dimensions for computational purposes. The dead layer thickness of HPGe detector has been ascertained from experiments and Monte Carlo computations. Experimental work with standard point and liquid sources in several cylindrical geometries has been undertaken for obtaining energy dependant efficiency. Monte Carlo simulations have been performed for computing efficiencies for point, liquid and gaseous sources. Self absorption correction factors have been obtained using mathematical equations for volume sources and MCNP simulations. Self-absorption correction and point source methods have been used to estimate the efficiency for gaseous sources. The efficiencies determined from the present work have been used to estimate activity of cover gas sample of a fast reactor.
NASA Astrophysics Data System (ADS)
Hussein, I.; Wilkins, M.; Roscoe, C.; Faber, W.; Chakravorty, S.; Schumacher, P.
2016-09-01
Finite Set Statistics (FISST) is a rigorous Bayesian multi-hypothesis management tool for the joint detection, classification and tracking of multi-sensor, multi-object systems. Implicit within the approach are solutions to the data association and target label-tracking problems. The full FISST filtering equations, however, are intractable. While FISST-based methods such as the PHD and CPHD filters are tractable, they require heavy moment approximations to the full FISST equations that result in a significant loss of information contained in the collected data. In this paper, we review Smart Sampling Markov Chain Monte Carlo (SSMCMC) that enables FISST to be tractable while avoiding moment approximations. We study the effect of tuning key SSMCMC parameters on tracking quality and computation time. The study is performed on a representative space object catalog with varying numbers of RSOs. The solution is implemented in the Scala computing language at the Maui High Performance Computing Center (MHPCC) facility.
Hirayama, Denise; Saron, Clodoaldo
2015-06-01
Polymeric materials constitute a considerable fraction of waste computer equipment and polymers acrylonitrile-butadiene-styrene and high-impact polystyrene are the main thermoplastic polymeric components found in waste computer equipment. Identification, separation and characterisation of additives present in acrylonitrile-butadiene-styrene and high-impact polystyrene are fundamental procedures to mechanical recycling of these polymers. The aim of this study was to evaluate the methods for identification of acrylonitrile-butadiene-styrene and high-impact polystyrene from waste computer equipment in Brazil, as well as their potential for mechanical recycling. The imprecise utilisation of symbols for identification of the polymers and the presence of additives containing toxic elements in determinate computer devices are some of the difficulties found for recycling of acrylonitrile-butadiene-styrene and high-impact polystyrene from waste computer equipment. However, the considerable performance of mechanical properties of the recycled acrylonitrile-butadiene-styrene and high-impact polystyrene when compared with the virgin materials confirms the potential for mechanical recycling of these polymers. © The Author(s) 2015.
Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.
Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo
2017-01-01
The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.
A real-time spike sorting method based on the embedded GPU.
Zelan Yang; Kedi Xu; Xiang Tian; Shaomin Zhang; Xiaoxiang Zheng
2017-07-01
Microelectrode arrays with hundreds of channels have been widely used to acquire neuron population signals in neuroscience studies. Online spike sorting is becoming one of the most important challenges for high-throughput neural signal acquisition systems. Graphic processing unit (GPU) with high parallel computing capability might provide an alternative solution for increasing real-time computational demands on spike sorting. This study reported a method of real-time spike sorting through computing unified device architecture (CUDA) which was implemented on an embedded GPU (NVIDIA JETSON Tegra K1, TK1). The sorting approach is based on the principal component analysis (PCA) and K-means. By analyzing the parallelism of each process, the method was further optimized in the thread memory model of GPU. Our results showed that the GPU-based classifier on TK1 is 37.92 times faster than the MATLAB-based classifier on PC while their accuracies were the same with each other. The high-performance computing features of embedded GPU demonstrated in our studies suggested that the embedded GPU provide a promising platform for the real-time neural signal processing.
Point and path performance of light aircraft: A review and analysis
NASA Technical Reports Server (NTRS)
Smetana, F. O.; Summey, D. C.; Johnson, W. D.
1973-01-01
The literature on methods for predicting the performance of light aircraft is reviewed. The methods discussed in the review extend from the classical instantaneous maximum or minimum technique to techniques for generating mathematically optimum flight paths. Classical point performance techniques are shown to be adequate in many cases but their accuracies are compromised by the need to use simple lift, drag, and thrust relations in order to get closed form solutions. Also the investigation of the effect of changes in weight, altitude, configuration, etc. involves many essentially repetitive calculations. Accordingly, computer programs are provided which can fit arbitrary drag polars and power curves with very high precision and which can then use the resulting fits to compute the performance under the assumption that the aircraft is not accelerating.
NASA Technical Reports Server (NTRS)
Oliger, Joseph
1997-01-01
Topics considered include: high-performance computing; cognitive and perceptual prostheses (computational aids designed to leverage human abilities); autonomous systems. Also included: development of a 3D unstructured grid code based on a finite volume formulation and applied to the Navier-stokes equations; Cartesian grid methods for complex geometry; multigrid methods for solving elliptic problems on unstructured grids; algebraic non-overlapping domain decomposition methods for compressible fluid flow problems on unstructured meshes; numerical methods for the compressible navier-stokes equations with application to aerodynamic flows; research in aerodynamic shape optimization; S-HARP: a parallel dynamic spectral partitioner; numerical schemes for the Hamilton-Jacobi and level set equations on triangulated domains; application of high-order shock capturing schemes to direct simulation of turbulence; multicast technology; network testbeds; supercomputer consolidation project.
NASA Astrophysics Data System (ADS)
Gorthi, Sai Siva; Rajshekhar, Gannavarpu; Rastogi, Pramod
2010-06-01
Recently, a high-order instantaneous moments (HIM)-operator-based method was proposed for accurate phase estimation in digital holographic interferometry. The method relies on piece-wise polynomial approximation of phase and subsequent evaluation of the polynomial coefficients from the HIM operator using single-tone frequency estimation. The work presents a comparative analysis of the performance of different single-tone frequency estimation techniques, like Fourier transform followed by optimization, estimation of signal parameters by rotational invariance technique (ESPRIT), multiple signal classification (MUSIC), and iterative frequency estimation by interpolation on Fourier coefficients (IFEIF) in HIM-operator-based methods for phase estimation. Simulation and experimental results demonstrate the potential of the IFEIF technique with respect to computational efficiency and estimation accuracy.
High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions
2016-08-30
High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated
Symplectic molecular dynamics simulations on specially designed parallel computers.
Borstnik, Urban; Janezic, Dusanka
2005-01-01
We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Capability of GPGPU for Faster Thermal Analysis Used in Data Assimilation
NASA Astrophysics Data System (ADS)
Takaki, Ryoji; Akita, Takeshi; Shima, Eiji
A thermal mathematical model plays an important role in operations on orbit as well as spacecraft thermal designs. The thermal mathematical model has some uncertain thermal characteristic parameters, such as thermal contact resistances between components, effective emittances of multilayer insulation (MLI) blankets, discouraging make up efficiency and accuracy of the model. A particle filter which is one of successive data assimilation methods has been applied to construct spacecraft thermal mathematical models. This method conducts a lot of ensemble computations, which require large computational power. Recently, General Purpose computing in Graphics Processing Unit (GPGPU) has been attracted attention in high performance computing. Therefore GPGPU is applied to increase the computational speed of thermal analysis used in the particle filter. This paper shows the speed-up results by using GPGPU as well as the application method of GPGPU.
Neighbour lists for smoothed particle hydrodynamics on GPUs
NASA Astrophysics Data System (ADS)
Winkler, Daniel; Rezavand, Massoud; Rauch, Wolfgang
2018-04-01
The efficient iteration of neighbouring particles is a performance critical aspect of any high performance smoothed particle hydrodynamics (SPH) solver. SPH solvers that implement a constant smoothing length generally divide the simulation domain into a uniform grid to reduce the computational complexity of the neighbour search. Based on this method, particle neighbours are either stored per grid cell or for each individual particle, denoted as Verlet list. While the latter approach has significantly higher memory requirements, it has the potential for a significant computational speedup. A theoretical comparison is performed to estimate the potential improvements of the method based on unknown hardware dependent factors. Subsequently, the computational performance of both approaches is empirically evaluated on graphics processing units. It is shown that the speedup differs significantly for different hardware, dimensionality and floating point precision. The Verlet list algorithm is implemented as an alternative to the cell linked list approach in the open-source SPH solver DualSPHysics and provided as a standalone software package.
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images
NASA Astrophysics Data System (ADS)
Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang
2016-10-01
Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.
High resolution flow field prediction for tail rotor aeroacoustics
NASA Technical Reports Server (NTRS)
Quackenbush, Todd R.; Bliss, Donald B.
1989-01-01
The prediction of tail rotor noise due to the impingement of the main rotor wake poses a significant challenge to current analysis methods in rotorcraft aeroacoustics. This paper describes the development of a new treatment of the tail rotor aerodynamic environment that permits highly accurate resolution of the incident flow field with modest computational effort relative to alternative models. The new approach incorporates an advanced full-span free wake model of the main rotor in a scheme which reconstructs high-resolution flow solutions from preliminary, computationally inexpensive simulations with coarse resolution. The heart of the approach is a novel method for using local velocity correction terms to capture the steep velocity gradients characteristic of the vortex-dominated incident flow. Sample calculations have been undertaken to examine the principal types of interactions between the tail rotor and the main rotor wake and to examine the performance of the new method. The results of these sample problems confirm the success of this approach in capturing the high-resolution flows necessary for analysis of rotor-wake/rotor interactions with dramatically reduced computational cost. Computations of radiated sound are also carried out that explore the role of various portions of the main rotor wake in generating tail rotor noise.
Parallel aeroelastic computations for wing and wing-body configurations
NASA Technical Reports Server (NTRS)
Byun, Chansup
1994-01-01
The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
NASA Astrophysics Data System (ADS)
Elantkowska, Magdalena; Ruczkowski, Jarosław; Sikorski, Andrzej; Dembczyński, Jerzy
2017-11-01
A parametric analysis of the hyperfine structure (hfs) for the even parity configurations of atomic terbium (Tb I) is presented in this work. We introduce the complete set of 4fN-core states in our high-performance computing (HPC) calculations. For calculations of the huge hyperfine structure matrix, requiring approximately 5000 hours when run on a single CPU, we propose the methods utilizing a personal computer cluster or, alternatively a cluster of Microsoft Azure virtual machines (VM). These methods give a factor 12 performance boost, enabling the calculations to complete in an acceptable time.
Mesoscopic modelling and simulation of soft matter.
Schiller, Ulf D; Krüger, Timm; Henrich, Oliver
2017-12-20
The deformability of soft condensed matter often requires modelling of hydrodynamical aspects to gain quantitative understanding. This, however, requires specialised methods that can resolve the multiscale nature of soft matter systems. We review a number of the most popular simulation methods that have emerged, such as Langevin dynamics, dissipative particle dynamics, multi-particle collision dynamics, sometimes also referred to as stochastic rotation dynamics, and the lattice-Boltzmann method. We conclude this review with a short glance at current compute architectures for high-performance computing and community codes for soft matter simulation.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
A Computationally Efficient Method for Polyphonic Pitch Estimation
NASA Astrophysics Data System (ADS)
Zhou, Ruohua; Reiss, Joshua D.; Mattavelli, Marco; Zoia, Giorgio
2009-12-01
This paper presents a computationally efficient method for polyphonic pitch estimation. The method employs the Fast Resonator Time-Frequency Image (RTFI) as the basic time-frequency analysis tool. The approach is composed of two main stages. First, a preliminary pitch estimation is obtained by means of a simple peak-picking procedure in the pitch energy spectrum. Such spectrum is calculated from the original RTFI energy spectrum according to harmonic grouping principles. Then the incorrect estimations are removed according to spectral irregularity and knowledge of the harmonic structures of the music notes played on commonly used music instruments. The new approach is compared with a variety of other frame-based polyphonic pitch estimation methods, and results demonstrate the high performance and computational efficiency of the approach.
NASA Astrophysics Data System (ADS)
Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.
2010-03-01
Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.
Bai, Ou; Lin, Peter; Vorbach, Sherry; Li, Jiang; Furlani, Steve; Hallett, Mark
2007-12-01
To explore effective combinations of computational methods for the prediction of movement intention preceding the production of self-paced right and left hand movements from single trial scalp electroencephalogram (EEG). Twelve naïve subjects performed self-paced movements consisting of three key strokes with either hand. EEG was recorded from 128 channels. The exploration was performed offline on single trial EEG data. We proposed that a successful computational procedure for classification would consist of spatial filtering, temporal filtering, feature selection, and pattern classification. A systematic investigation was performed with combinations of spatial filtering using principal component analysis (PCA), independent component analysis (ICA), common spatial patterns analysis (CSP), and surface Laplacian derivation (SLD); temporal filtering using power spectral density estimation (PSD) and discrete wavelet transform (DWT); pattern classification using linear Mahalanobis distance classifier (LMD), quadratic Mahalanobis distance classifier (QMD), Bayesian classifier (BSC), multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), and support vector machine (SVM). A robust multivariate feature selection strategy using a genetic algorithm was employed. The combinations of spatial filtering using ICA and SLD, temporal filtering using PSD and DWT, and classification methods using LMD, QMD, BSC and SVM provided higher performance than those of other combinations. Utilizing one of the better combinations of ICA, PSD and SVM, the discrimination accuracy was as high as 75%. Further feature analysis showed that beta band EEG activity of the channels over right sensorimotor cortex was most appropriate for discrimination of right and left hand movement intention. Effective combinations of computational methods provide possible classification of human movement intention from single trial EEG. Such a method could be the basis for a potential brain-computer interface based on human natural movement, which might reduce the requirement of long-term training. Effective combinations of computational methods can classify human movement intention from single trial EEG with reasonable accuracy.
ERIC Educational Resources Information Center
Federal Coordinating Council for Science, Engineering and Technology, Washington, DC.
This report presents a review of the High Performance Computing and Communications (HPCC) Program, which has as its goal the acceleration of the commercial availability and utilization of the next generation of high performance computers and networks in order to: (1) extend U.S. technological leadership in high performance computing and computer…
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Numerical Simulation of Transit-Time Ultrasonic Flowmeters by a Direct Approach.
Luca, Adrian; Marchiano, Regis; Chassaing, Jean-Camille
2016-06-01
This paper deals with the development of a computational code for the numerical simulation of wave propagation through domains with a complex geometry consisting in both solids and moving fluids. The emphasis is on the numerical simulation of ultrasonic flowmeters (UFMs) by modeling the wave propagation in solids with the equations of linear elasticity (ELE) and in fluids with the linearized Euler equations (LEEs). This approach requires high performance computing because of the high number of degrees of freedom and the long propagation distances. Therefore, the numerical method should be chosen with care. In order to minimize the numerical dissipation which may occur in this kind of configuration, the numerical method employed here is the nodal discontinuous Galerkin (DG) method. Also, this method is well suited for parallel computing. To speed up the code, almost all the computational stages have been implemented to run on graphical processing unit (GPU) by using the compute unified device architecture (CUDA) programming model from NVIDIA. This approach has been validated and then used for the two-dimensional simulation of gas UFMs. The large contrast of acoustic impedance characteristic to gas UFMs makes their simulation a real challenge.
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python
Laura, Jason R.; Rey, Sergio J.
2017-01-01
Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
New technology in turbine aerodynamics.
NASA Technical Reports Server (NTRS)
Glassman, A. J.; Moffitt, T. P.
1972-01-01
Cursory review of some recent work that has been done in turbine aerodynamic research. Topics discussed include the aerodynamic effect of turbine coolant, high work-factor (ratio of stage work to square of blade speed) turbines, and computer methods for turbine design and performance prediction. Experimental cooled-turbine aerodynamics programs using two-dimensional cascades, full annular cascades, and cold rotating turbine stage tests are discussed with some typical results presented. Analytically predicted results for cooled blade performance are compared to experimental results. The problems and some of the current programs associated with the use of very high work factors for fan-drive turbines of high-bypass-ratio engines are discussed. Computer programs have been developed for turbine design-point performance, off-design performance, supersonic blade profile design, and the calculation of channel velocities for subsonic and transonic flowfields. The use of these programs for the design and analysis of axial and radial turbines is discussed.
Hoang, Tuan; Tran, Dat; Huang, Xu
2013-01-01
Common Spatial Pattern (CSP) is a state-of-the-art method for feature extraction in Brain-Computer Interface (BCI) systems. However it is designed for 2-class BCI classification problems. Current extensions of this method to multiple classes based on subspace union and covariance matrix similarity do not provide a high performance. This paper presents a new approach to solving multi-class BCI classification problems by forming a subspace resembled from original subspaces and the proposed method for this approach is called Approximation-based Common Principal Component (ACPC). We perform experiments on Dataset 2a used in BCI Competition IV to evaluate the proposed method. This dataset was designed for motor imagery classification with 4 classes. Preliminary experiments show that the proposed ACPC feature extraction method when combining with Support Vector Machines outperforms CSP-based feature extraction methods on the experimental dataset.
Facilities | Integrated Energy Solutions | NREL
strategies needed to optimize our entire energy system. A photo of the high-performance computer at NREL . High-Performance Computing Data Center High-performance computing facilities at NREL provide high-speed
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
High Fidelity Simulations of Unsteady Flow through Turbopumps and Flowliners
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, dochan; Chan, William; Housman, Jeff
2006-01-01
High fidelity computations were carried out to analyze the orbiter LH2 feedline flowliner. Computations were performed on the Columbia platform which is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processor each. Various computational models were used to characterize the unsteady flow features in the turbopump, including the orbiter Low-Pressure-Fuel-Turbopump (LPFTP) inducer, the orbiter manifold and a test article used to represent the manifold. Unsteady flow originating from the orbiter LPFTP inducer is one of the major contributors to the high frequency cyclic loading that results in high cycle fatigue damage to the gimbal flowliners just upstream of the LPFTP. The flow fields for the orbiter manifold and representative test article are computed and analyzed for similarities and differences. The incompressible Navier-Stokes flow solver INS3D, based on the artificial compressibility method, was used to compute the flow of liquid hydrogen in each test article.
NASA Astrophysics Data System (ADS)
McCray, Wilmon Wil L., Jr.
The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.
Head-target tracking control of well drilling
NASA Astrophysics Data System (ADS)
Agzamov, Z. V.
2018-05-01
The method of directional drilling trajectory control for oil and gas wells using predictive models is considered in the paper. The developed method does not apply optimization and therefore there is no need for the high-performance computing. Nevertheless, it allows following the well-plan with high precision taking into account process input saturation. Controller output is calculated both from the present target reference point of the well-plan and from well trajectory prediction with using the analytical model. This method allows following a well-plan not only on angular, but also on the Cartesian coordinates. Simulation of the control system has confirmed the high precision and operation performance with a wide range of random disturbance action.
Rapid sampling of stochastic displacements in Brownian dynamics simulations
NASA Astrophysics Data System (ADS)
Fiore, Andrew M.; Balboa Usabiaga, Florencio; Donev, Aleksandar; Swan, James W.
2017-03-01
We present a new method for sampling stochastic displacements in Brownian Dynamics (BD) simulations of colloidal scale particles. The method relies on a new formulation for Ewald summation of the Rotne-Prager-Yamakawa (RPY) tensor, which guarantees that the real-space and wave-space contributions to the tensor are independently symmetric and positive-definite for all possible particle configurations. Brownian displacements are drawn from a superposition of two independent samples: a wave-space (far-field or long-ranged) contribution, computed using techniques from fluctuating hydrodynamics and non-uniform fast Fourier transforms; and a real-space (near-field or short-ranged) correction, computed using a Krylov subspace method. The combined computational complexity of drawing these two independent samples scales linearly with the number of particles. The proposed method circumvents the super-linear scaling exhibited by all known iterative sampling methods applied directly to the RPY tensor that results from the power law growth of the condition number of tensor with the number of particles. For geometrically dense microstructures (fractal dimension equal three), the performance is independent of volume fraction, while for tenuous microstructures (fractal dimension less than three), such as gels and polymer solutions, the performance improves with decreasing volume fraction. This is in stark contrast with other related linear-scaling methods such as the force coupling method and the fluctuating immersed boundary method, for which performance degrades with decreasing volume fraction. Calculations for hard sphere dispersions and colloidal gels are illustrated and used to explore the role of microstructure on performance of the algorithm. In practice, the logarithmic part of the predicted scaling is not observed and the algorithm scales linearly for up to 4 ×106 particles, obtaining speed ups of over an order of magnitude over existing iterative methods, and making the cost of computing Brownian displacements comparable to the cost of computing deterministic displacements in BD simulations. A high-performance implementation employing non-uniform fast Fourier transforms implemented on graphics processing units and integrated with the software package HOOMD-blue is used for benchmarking.
Biological modelling of a computational spiking neural network with neuronal avalanches.
Li, Xiumin; Chen, Qing; Xue, Fangzheng
2017-06-28
In recent years, an increasing number of studies have demonstrated that networks in the brain can self-organize into a critical state where dynamics exhibit a mixture of ordered and disordered patterns. This critical branching phenomenon is termed neuronal avalanches. It has been hypothesized that the homeostatic level balanced between stability and plasticity of this critical state may be the optimal state for performing diverse neural computational tasks. However, the critical region for high performance is narrow and sensitive for spiking neural networks (SNNs). In this paper, we investigated the role of the critical state in neural computations based on liquid-state machines, a biologically plausible computational neural network model for real-time computing. The computational performance of an SNN when operating at the critical state and, in particular, with spike-timing-dependent plasticity for updating synaptic weights is investigated. The network is found to show the best computational performance when it is subjected to critical dynamic states. Moreover, the active-neuron-dominant structure refined from synaptic learning can remarkably enhance the robustness of the critical state and further improve computational accuracy. These results may have important implications in the modelling of spiking neural networks with optimal computational performance.This article is part of the themed issue 'Mathematical methods in medicine: neuroscience, cardiology and pathology'. © 2017 The Author(s).
Biological modelling of a computational spiking neural network with neuronal avalanches
NASA Astrophysics Data System (ADS)
Li, Xiumin; Chen, Qing; Xue, Fangzheng
2017-05-01
In recent years, an increasing number of studies have demonstrated that networks in the brain can self-organize into a critical state where dynamics exhibit a mixture of ordered and disordered patterns. This critical branching phenomenon is termed neuronal avalanches. It has been hypothesized that the homeostatic level balanced between stability and plasticity of this critical state may be the optimal state for performing diverse neural computational tasks. However, the critical region for high performance is narrow and sensitive for spiking neural networks (SNNs). In this paper, we investigated the role of the critical state in neural computations based on liquid-state machines, a biologically plausible computational neural network model for real-time computing. The computational performance of an SNN when operating at the critical state and, in particular, with spike-timing-dependent plasticity for updating synaptic weights is investigated. The network is found to show the best computational performance when it is subjected to critical dynamic states. Moreover, the active-neuron-dominant structure refined from synaptic learning can remarkably enhance the robustness of the critical state and further improve computational accuracy. These results may have important implications in the modelling of spiking neural networks with optimal computational performance. This article is part of the themed issue `Mathematical methods in medicine: neuroscience, cardiology and pathology'.
Several advances in the analytic element method have been made to enhance its performance and facilitate three-dimensional ground-water flow modeling in a regional aquifer setting. First, a new public domain modular code (ModAEM) has been developed for modeling ground-water flow ...
NASA Astrophysics Data System (ADS)
Lin, Y.; O'Malley, D.; Vesselinov, V. V.
2015-12-01
Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
NASA Astrophysics Data System (ADS)
Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry
2003-08-01
Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Efficient scatter model for simulation of ultrasound images from computed tomography data
NASA Astrophysics Data System (ADS)
D'Amato, J. P.; Lo Vercio, L.; Rubi, P.; Fernandez Vera, E.; Barbuzza, R.; Del Fresno, M.; Larrabide, I.
2015-12-01
Background and motivation: Real-time ultrasound simulation refers to the process of computationally creating fully synthetic ultrasound images instantly. Due to the high value of specialized low cost training for healthcare professionals, there is a growing interest in the use of this technology and the development of high fidelity systems that simulate the acquisitions of echographic images. The objective is to create an efficient and reproducible simulator that can run either on notebooks or desktops using low cost devices. Materials and methods: We present an interactive ultrasound simulator based on CT data. This simulator is based on ray-casting and provides real-time interaction capabilities. The simulation of scattering that is coherent with the transducer position in real time is also introduced. Such noise is produced using a simplified model of multiplicative noise and convolution with point spread functions (PSF) tailored for this purpose. Results: The computational efficiency of scattering maps generation was revised with an improved performance. This allowed a more efficient simulation of coherent scattering in the synthetic echographic images while providing highly realistic result. We describe some quality and performance metrics to validate these results, where a performance of up to 55fps was achieved. Conclusion: The proposed technique for real-time scattering modeling provides realistic yet computationally efficient scatter distributions. The error between the original image and the simulated scattering image was compared for the proposed method and the state-of-the-art, showing negligible differences in its distribution.
Validity of a Simple Method for Measuring Force-Velocity-Power Profile in Countermovement Jump.
Jiménez-Reyes, Pedro; Samozino, Pierre; Pareja-Blanco, Fernando; Conceição, Filipe; Cuadrado-Peñafiel, Víctor; González-Badillo, Juan José; Morin, Jean-Benoît
2017-01-01
To analyze the reliability and validity of a simple computation method to evaluate force (F), velocity (v), and power (P) output during a countermovement jump (CMJ) suitable for use in field conditions and to verify the validity of this computation method to compute the CMJ force-velocity (F-v) profile (including unloaded and loaded jumps) in trained athletes. Sixteen high-level male sprinters and jumpers performed maximal CMJs under 6 different load conditions (0-87 kg). A force plate sampling at 1000 Hz was used to record vertical ground-reaction force and derive vertical-displacement data during CMJ trials. For each condition, mean F, v, and P of the push-off phase were determined from both force-plate data (reference method) and simple computation measures based on body mass, jump height (from flight time), and push-off distance and used to establish the linear F-v relationship for each individual. Mean absolute bias values were 0.9% (± 1.6%), 4.7% (± 6.2%), 3.7% (± 4.8%), and 5% (± 6.8%) for F, v, P, and slope of the F-v relationship (S Fv ), respectively. Both methods showed high correlations for F-v-profile-related variables (r = .985-.991). Finally, all variables computed from the simple method showed high reliability, with ICC >.980 and CV <1.0%. These results suggest that the simple method presented here is valid and reliable for computing CMJ force, velocity, power, and F-v profiles in athletes and could be used in practice under field conditions when body mass, push-off distance, and jump height are known.
Comparison of Implicit Collocation Methods for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)
2001-01-01
We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.
GPUs benchmarking in subpixel image registration algorithm
NASA Astrophysics Data System (ADS)
Sanz-Sabater, Martin; Picazo-Bueno, Jose Angel; Micó, Vicente; Ferrerira, Carlos; Granero, Luis; Garcia, Javier
2015-05-01
Image registration techniques are used among different scientific fields, like medical imaging or optical metrology. The straightest way to calculate shifting between two images is using the cross correlation, taking the highest value of this correlation image. Shifting resolution is given in whole pixels which cannot be enough for certain applications. Better results can be achieved interpolating both images, as much as the desired resolution we want to get, and applying the same technique described before, but the memory needed by the system is significantly higher. To avoid memory consuming we are implementing a subpixel shifting method based on FFT. With the original images, subpixel shifting can be achieved multiplying its discrete Fourier transform by a linear phase with different slopes. This method is high time consuming method because checking a concrete shifting means new calculations. The algorithm, highly parallelizable, is very suitable for high performance computing systems. GPU (Graphics Processing Unit) accelerated computing became very popular more than ten years ago because they have hundreds of computational cores in a reasonable cheap card. In our case, we are going to register the shifting between two images, doing the first approach by FFT based correlation, and later doing the subpixel approach using the technique described before. We consider it as `brute force' method. So we will present a benchmark of the algorithm consisting on a first approach (pixel resolution) and then do subpixel resolution approaching, decreasing the shifting step in every loop achieving a high resolution in few steps. This program will be executed in three different computers. At the end, we will present the results of the computation, with different kind of CPUs and GPUs, checking the accuracy of the method, and the time consumed in each computer, discussing the advantages, disadvantages of the use of GPUs.
NASA Astrophysics Data System (ADS)
Min, M.
2017-10-01
Context. Opacities of molecules in exoplanet atmospheres rely on increasingly detailed line-lists for these molecules. The line lists available today contain for many species up to several billions of lines. Computation of the spectral line profile created by pressure and temperature broadening, the Voigt profile, of all of these lines is becoming a computational challenge. Aims: We aim to create a method to compute the Voigt profile in a way that automatically focusses the computation time into the strongest lines, while still maintaining the continuum contribution of the high number of weaker lines. Methods: Here, we outline a statistical line sampling technique that samples the Voigt profile quickly and with high accuracy. The number of samples is adjusted to the strength of the line and the local spectral line density. This automatically provides high accuracy line shapes for strong lines or lines that are spectrally isolated. The line sampling technique automatically preserves the integrated line opacity for all lines, thereby also providing the continuum opacity created by the large number of weak lines at very low computational cost. Results: The line sampling technique is tested for accuracy when computing line spectra and correlated-k tables. Extremely fast computations ( 3.5 × 105 lines per second per core on a standard current day desktop computer) with high accuracy (≤1% almost everywhere) are obtained. A detailed recipe on how to perform the computations is given.
Lu, Dan; Zhang, Guannan; Webster, Clayton G.; ...
2016-12-30
In this paper, we develop an improved multilevel Monte Carlo (MLMC) method for estimating cumulative distribution functions (CDFs) of a quantity of interest, coming from numerical approximation of large-scale stochastic subsurface simulations. Compared with Monte Carlo (MC) methods, that require a significantly large number of high-fidelity model executions to achieve a prescribed accuracy when computing statistical expectations, MLMC methods were originally proposed to significantly reduce the computational cost with the use of multifidelity approximations. The improved performance of the MLMC methods depends strongly on the decay of the variance of the integrand as the level increases. However, the main challengemore » in estimating CDFs is that the integrand is a discontinuous indicator function whose variance decays slowly. To address this difficult task, we approximate the integrand using a smoothing function that accelerates the decay of the variance. In addition, we design a novel a posteriori optimization strategy to calibrate the smoothing function, so as to balance the computational gain and the approximation error. The combined proposed techniques are integrated into a very general and practical algorithm that can be applied to a wide range of subsurface problems for high-dimensional uncertainty quantification, such as a fine-grid oil reservoir model considered in this effort. The numerical results reveal that with the use of the calibrated smoothing function, the improved MLMC technique significantly reduces the computational complexity compared to the standard MC approach. Finally, we discuss several factors that affect the performance of the MLMC method and provide guidance for effective and efficient usage in practice.« less
Making Cloud Computing Available For Researchers and Innovators (Invited)
NASA Astrophysics Data System (ADS)
Winsor, R.
2010-12-01
High Performance Computing (HPC) facilities exist in most academic institutions but are almost invariably over-subscribed. Access is allocated based on academic merit, the only practical method of assigning valuable finite compute resources. Cloud computing on the other hand, and particularly commercial clouds, draw flexibly on an almost limitless resource as long as the user has sufficient funds to pay the bill. How can the commercial cloud model be applied to scientific computing? Is there a case to be made for a publicly available research cloud and how would it be structured? This talk will explore these themes and describe how Cybera, a not-for-profit non-governmental organization in Alberta Canada, aims to leverage its high speed research and education network to provide cloud computing facilities for a much wider user base.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
NASA Technical Reports Server (NTRS)
Carlson, Harry W.; Darden, Christine M.
1988-01-01
Extensive correlations of computer code results with experimental data are employed to illustrate the use of linearized theory attached flow methods for the estimation and optimization of the aerodynamic performance of simple hinged flap systems. Use of attached flow methods is based on the premise that high levels of aerodynamic efficiency require a flow that is as nearly attached as circumstances permit. A variety of swept wing configurations are considered ranging from fighters to supersonic transports, all with leading- and trailing-edge flaps for enhancement of subsonic aerodynamic efficiency. The results indicate that linearized theory attached flow computer code methods provide a rational basis for the estimation and optimization of flap system aerodynamic performance at subsonic speeds. The analysis also indicates that vortex flap design is not an opposing approach but is closely related to attached flow design concepts. The successful vortex flap design actually suppresses the formation of detached vortices to produce a small vortex which is restricted almost entirely to the leading edge flap itself.
Experimental magic state distillation for fault-tolerant quantum computing.
Souza, Alexandre M; Zhang, Jingfu; Ryan, Colm A; Laflamme, Raymond
2011-01-25
Any physical quantum device for quantum information processing (QIP) is subject to errors in implementation. In order to be reliable and efficient, quantum computers will need error-correcting or error-avoiding methods. Fault-tolerance achieved through quantum error correction will be an integral part of quantum computers. Of the many methods that have been discovered to implement it, a highly successful approach has been to use transversal gates and specific initial states. A critical element for its implementation is the availability of high-fidelity initial states, such as |0〉 and the 'magic state'. Here, we report an experiment, performed in a nuclear magnetic resonance (NMR) quantum processor, showing sufficient quantum control to improve the fidelity of imperfect initial magic states by distilling five of them into one with higher fidelity.
Zheng, Xiujuan; Wei, Wentao; Huang, Qiu; Song, Shaoli; Wan, Jieqing; Huang, Gang
2017-01-01
The objective and quantitative analysis of longitudinal single photon emission computed tomography (SPECT) images are significant for the treatment monitoring of brain disorders. Therefore, a computer aided analysis (CAA) method is introduced to extract a change-rate map (CRM) as a parametric image for quantifying the changes of regional cerebral blood flow (rCBF) in longitudinal SPECT brain images. The performances of the CAA-CRM approach in treatment monitoring are evaluated by the computer simulations and clinical applications. The results of computer simulations show that the derived CRMs have high similarities with their ground truths when the lesion size is larger than system spatial resolution and the change rate is higher than 20%. In clinical applications, the CAA-CRM approach is used to assess the treatment of 50 patients with brain ischemia. The results demonstrate that CAA-CRM approach has a 93.4% accuracy of recovered region's localization. Moreover, the quantitative indexes of recovered regions derived from CRM are all significantly different among the groups and highly correlated with the experienced clinical diagnosis. In conclusion, the proposed CAA-CRM approach provides a convenient solution to generate a parametric image and derive the quantitative indexes from the longitudinal SPECT brain images for treatment monitoring.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
A Subspace Pursuit–based Iterative Greedy Hierarchical Solution to the Neuromagnetic Inverse Problem
Babadi, Behtash; Obregon-Henao, Gabriel; Lamus, Camilo; Hämäläinen, Matti S.; Brown, Emery N.; Purdon, Patrick L.
2013-01-01
Magnetoencephalography (MEG) is an important non-invasive method for studying activity within the human brain. Source localization methods can be used to estimate spatiotemporal activity from MEG measurements with high temporal resolution, but the spatial resolution of these estimates is poor due to the ill-posed nature of the MEG inverse problem. Recent developments in source localization methodology have emphasized temporal as well as spatial constraints to improve source localization accuracy, but these methods can be computationally intense. Solutions emphasizing spatial sparsity hold tremendous promise, since the underlying neurophysiological processes generating MEG signals are often sparse in nature, whether in the form of focal sources, or distributed sources representing large-scale functional networks. Recent developments in the theory of compressed sensing (CS) provide a rigorous framework to estimate signals with sparse structure. In particular, a class of CS algorithms referred to as greedy pursuit algorithms can provide both high recovery accuracy and low computational complexity. Greedy pursuit algorithms are difficult to apply directly to the MEG inverse problem because of the high-dimensional structure of the MEG source space and the high spatial correlation in MEG measurements. In this paper, we develop a novel greedy pursuit algorithm for sparse MEG source localization that overcomes these fundamental problems. This algorithm, which we refer to as the Subspace Pursuit-based Iterative Greedy Hierarchical (SPIGH) inverse solution, exhibits very low computational complexity while achieving very high localization accuracy. We evaluate the performance of the proposed algorithm using comprehensive simulations, as well as the analysis of human MEG data during spontaneous brain activity and somatosensory stimuli. These studies reveal substantial performance gains provided by the SPIGH algorithm in terms of computational complexity, localization accuracy, and robustness. PMID:24055554
Washko, George R; Criner, Gerald J; Mohsenifar, Zab; Sciurba, Frank C; Sharafkhaneh, Amir; Make, Barry J; Hoffman, Eric A; Reilly, John J
2008-06-01
Computed tomographic based indices of emphysematous lung destruction may highlight differences in disease pathogenesis and further enable the classification of subjects with Chronic Obstructive Pulmonary Disease. While there are multiple techniques that can be utilized for such radiographic analysis, there is very little published information comparing the performance of these methods in a clinical case series. Our objective was to examine several quantitative and semi-quantitative methods for the assessment of the burden of emphysema apparent on computed tomographic scans and compare their ability to predict lung mechanics and function. Automated densitometric analysis was performed on 1094 computed tomographic scans collected upon enrollment into the National Emphysema Treatment Trial. Trained radiologists performed an additional visual grading of emphysema on high resolution CT scans. Full pulmonary function test results were available for correlation, with a subset of subjects having additional measurements of lung static recoil. There was a wide range of emphysematous lung destruction apparent on the CT scans and univariate correlations to measures of lung function were of modest strength. No single method of CT scan analysis clearly outperformed the rest of the group. Quantification of the burden of emphysematous lung destruction apparent on CT scan is a weak predictor of lung function and mechanics in severe COPD with no uniformly superior method found to perform this analysis. The CT based quantification of emphysema may augment pulmonary function testing in the characterization of COPD by providing complementary phenotypic information.
Finite-difference computations of rotor loads
NASA Technical Reports Server (NTRS)
Caradonna, F. X.; Tung, C.
1985-01-01
This paper demonstrates the current and future potential of finite-difference methods for solving real rotor problems which now rely largely on empiricism. The demonstration consists of a simple means of combining existing finite-difference, integral, and comprehensive loads codes to predict real transonic rotor flows. These computations are performed for hover and high-advance-ratio flight. Comparisons are made with experimental pressure data.
Finite-difference computations of rotor loads
NASA Technical Reports Server (NTRS)
Caradonna, F. X.; Tung, C.
1985-01-01
The current and future potential of finite difference methods for solving real rotor problems which now rely largely on empiricism are demonstrated. The demonstration consists of a simple means of combining existing finite-difference, integral, and comprehensive loads codes to predict real transonic rotor flows. These computations are performed for hover and high-advanced-ratio flight. Comparisons are made with experimental pressure data.
High-Performance Computing Data Center Warm-Water Liquid Cooling |
Computational Science | NREL Warm-Water Liquid Cooling High-Performance Computing Data Center Warm-Water Liquid Cooling NREL's High-Performance Computing Data Center (HPC Data Center) is liquid water Liquid cooling technologies offer a more energy-efficient solution that also allows for effective
NASA Astrophysics Data System (ADS)
Jizhi, Liu; Xingbi, Chen
2009-12-01
A new quasi-three-dimensional (quasi-3D) numeric simulation method for a high-voltage level-shifting circuit structure is proposed. The performances of the 3D structure are analyzed by combining some 2D device structures; the 2D devices are in two planes perpendicular to each other and to the surface of the semiconductor. In comparison with Davinci, the full 3D device simulation tool, the quasi-3D simulation method can give results for the potential and current distribution of the 3D high-voltage level-shifting circuit structure with appropriate accuracy and the total CPU time for simulation is significantly reduced. The quasi-3D simulation technique can be used in many cases with advantages such as saving computing time, making no demands on the high-end computer terminals, and being easy to operate.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes
NASA Technical Reports Server (NTRS)
Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak
2004-01-01
High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
NASA Astrophysics Data System (ADS)
Ahn, Sul-Ah; Jung, Youngim
2016-10-01
The research activities of the computational physicists utilizing high performance computing are analyzed by bibliometirc approaches. This study aims at providing the computational physicists utilizing high-performance computing and policy planners with useful bibliometric results for an assessment of research activities. In order to achieve this purpose, we carried out a co-authorship network analysis of journal articles to assess the research activities of researchers for high-performance computational physics as a case study. For this study, we used journal articles of the Scopus database from Elsevier covering the time period of 2004-2013. We extracted the author rank in the physics field utilizing high-performance computing by the number of papers published during ten years from 2004. Finally, we drew the co-authorship network for 45 top-authors and their coauthors, and described some features of the co-authorship network in relation to the author rank. Suggestions for further studies are discussed.
Zhao, Anbang; Ma, Lin; Ma, Xuefei; Hui, Juan
2017-02-20
In this paper, an improved azimuth angle estimation method with a single acoustic vector sensor (AVS) is proposed based on matched filtering theory. The proposed method is mainly applied in an active sonar detection system. According to the conventional passive method based on complex acoustic intensity measurement, the mathematical and physical model of this proposed method is described in detail. The computer simulation and lake experiments results indicate that this method can realize the azimuth angle estimation with high precision by using only a single AVS. Compared with the conventional method, the proposed method achieves better estimation performance. Moreover, the proposed method does not require complex operations in frequencydomain and achieves computational complexity reduction.
A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations
NASA Technical Reports Server (NTRS)
Dydson, Roger W.; Goodrich, John W.
2000-01-01
Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Formal Solutions for Polarized Radiative Transfer. II. High-order Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janett, Gioele; Steiner, Oskar; Belluzzi, Luca, E-mail: gioele.janett@irsol.ch
When integrating the radiative transfer equation for polarized light, the necessity of high-order numerical methods is well known. In fact, well-performing high-order formal solvers enable higher accuracy and the use of coarser spatial grids. Aiming to provide a clear comparison between formal solvers, this work presents different high-order numerical schemes and applies the systematic analysis proposed by Janett et al., emphasizing their advantages and drawbacks in terms of order of accuracy, stability, and computational cost.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
2013-01-01
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
NASA Astrophysics Data System (ADS)
Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley
2015-04-01
The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.
A Comparative Study of High and Low Fidelity Fan Models for Turbofan Engine System Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Afjeh, Abdollah A.
1991-01-01
In this paper, a heterogeneous propulsion system simulation method is presented. The method is based on the formulation of a cycle model of a gas turbine engine. The model includes the nonlinear characteristics of the engine components via use of empirical data. The potential to simulate the entire engine operation on a computer without the aid of data is demonstrated by numerically generating "performance maps" for a fan component using two flow models of varying fidelity. The suitability of the fan models were evaluated by comparing the computed performance with experimental data. A discussion of the potential benefits and/or difficulties in connecting simulations solutions of differing fidelity is given.
Aerodynamics of High-Lift Configuration Civil Aircraft Model in JAXA
NASA Astrophysics Data System (ADS)
Yokokawa, Yuzuru; Murayama, Mitsuhiro; Ito, Takeshi; Yamamoto, Kazuomi
This paper presents basic aerodynamics and stall characteristics of the high-lift configuration aircraft model JSM (JAXA Standard Model). During research process of developing high-lift system design method, wind tunnel testing at JAXA 6.5m by 5.5m low-speed wind tunnel and Navier-Stokes computation on unstructured hybrid mesh were performed for a realistic configuration aircraft model equipped with high-lift devices, fuselage, nacelle-pylon, slat tracks and Flap Track Fairings (FTF), which was assumed 100 passenger class modern commercial transport aircraft. The testing and the computation aimed to understand flow physics and then to obtain some guidelines for designing a high performance high-lift system. As a result of the testing, Reynolds number effects within linear region and stall region were observed. Analysis of static pressure distribution and flow visualization gave the knowledge to understand the aerodynamic performance. CFD could capture the whole characteristics of basic aerodynamics and clarify flow mechanism which governs stall characteristics even for complicated geometry and its flow field. This collaborative work between wind tunnel testing and CFD is advantageous for improving or has improved the aerodynamic performance.
Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code
NASA Technical Reports Server (NTRS)
Spiegel, Seth C.; DeBonis, James R.; Huynh, H. T.
2016-01-01
A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
Implementing Molecular Dynamics on Hybrid High Performance Computers - Three-Body Potentials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Yamada, Masako
The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power re- quirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to efficiently use accelerators to improve the performance of molecular dynamics (MD) employing pairwise potential energy models, little is reported in the literature for models that includemore » many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger-Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algo- rithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger-Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface.« less
Near-Field Source Localization by Using Focusing Technique
NASA Astrophysics Data System (ADS)
He, Hongyang; Wang, Yide; Saillard, Joseph
2008-12-01
We discuss two fast algorithms to localize multiple sources in near field. The symmetry-based method proposed by Zhi and Chia (2007) is first improved by implementing a search-free procedure for the reduction of computation cost. We present then a focusing-based method which does not require symmetric array configuration. By using focusing technique, the near-field signal model is transformed into a model possessing the same structure as in the far-field situation, which allows the bearing estimation with the well-studied far-field methods. With the estimated bearing, the range estimation of each source is consequently obtained by using 1D MUSIC method without parameter pairing. The performance of the improved symmetry-based method and the proposed focusing-based method is compared by Monte Carlo simulations and with Crammer-Rao bound as well. Unlike other near-field algorithms, these two approaches require neither high-computation cost nor high-order statistics.
High-resolution subgrid models: background, grid generation, and implementation
NASA Astrophysics Data System (ADS)
Sehili, Aissa; Lang, Günther; Lippert, Christoph
2014-04-01
The basic idea of subgrid models is the use of available high-resolution bathymetric data at subgrid level in computations that are performed on relatively coarse grids allowing large time steps. For that purpose, an algorithm that correctly represents the precise mass balance in regions where wetting and drying occur was derived by Casulli (Int J Numer Method Fluids 60:391-408, 2009) and Casulli and Stelling (Int J Numer Method Fluids 67:441-449, 2010). Computational grid cells are permitted to be wet, partially wet, or dry, and no drying threshold is needed. Based on the subgrid technique, practical applications involving various scenarios were implemented including an operational forecast model for water level, salinity, and temperature of the Elbe Estuary in Germany. The grid generation procedure allows a detailed boundary fitting at subgrid level. The computational grid is made of flow-aligned quadrilaterals including few triangles where necessary. User-defined grid subdivision at subgrid level allows a correct representation of the volume up to measurement accuracy. Bottom friction requires a particular treatment. Based on the conveyance approach, an appropriate empirical correction was worked out. The aforementioned features make the subgrid technique very efficient, robust, and accurate. Comparison of predicted water levels with the comparatively highly resolved classical unstructured grid model shows very good agreement. The speedup in computational performance due to the use of the subgrid technique is about a factor of 20. A typical daily forecast can be carried out in less than 10 min on a standard PC-like hardware. The subgrid technique is therefore a promising framework to perform accurate temporal and spatial large-scale simulations of coastal and estuarine flow and transport processes at low computational cost.
Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun
2008-05-28
Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.
Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun
2008-01-01
Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045
Fang, Ruogu; Chen, Tsuhan; Sanelli, Pina C
2013-05-01
Computed tomography perfusion (CTP) is an important functional imaging modality in the evaluation of cerebrovascular diseases, particularly in acute stroke and vasospasm. However, the post-processed parametric maps of blood flow tend to be noisy, especially in low-dose CTP, due to the noisy contrast enhancement profile and the oscillatory nature of the results generated by the current computational methods. In this paper, we propose a robust sparse perfusion deconvolution method (SPD) to estimate cerebral blood flow in CTP performed at low radiation dose. We first build a dictionary from high-dose perfusion maps using online dictionary learning and then perform deconvolution-based hemodynamic parameters estimation on the low-dose CTP data. Our method is validated on clinical data of patients with normal and pathological CBF maps. The results show that we achieve superior performance than existing methods, and potentially improve the differentiation between normal and ischemic tissue in the brain. Copyright © 2013 Elsevier B.V. All rights reserved.
Fang, Ruogu; Chen, Tsuhan; Sanelli, Pina C.
2014-01-01
Computed tomography perfusion (CTP) is an important functional imaging modality in the evaluation of cerebrovascular diseases, particularly in acute stroke and vasospasm. However, the post-processed parametric maps of blood flow tend to be noisy, especially in low-dose CTP, due to the noisy contrast enhancement profile and the oscillatory nature of the results generated by the current computational methods. In this paper, we propose a robust sparse perfusion deconvolution method (SPD) to estimate cerebral blood flow in CTP performed at low radiation dose. We first build a dictionary from high-dose perfusion maps using online dictionary learning and then perform deconvolution-based hemodynamic parameters estimation on the low-dose CTP data. Our method is validated on clinical data of patients with normal and pathological CBF maps. The results show that we achieve superior performance than existing methods, and potentially improve the differentiation between normal and ischemic tissue in the brain. PMID:23542422
Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr
2015-08-17
Here, we describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. Furthermore, HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F.; ...
2015-07-30
The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community.more » However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.« less
The role of atomic lines in radiation heating of the experimental space vehicle Fire-II
NASA Astrophysics Data System (ADS)
Surzhikov, S. T.
2015-10-01
The results of calculating the convective and radiation heating of the Fire-II experimental space vehicle allowing for atomic lines of atoms and ions using the NERAT-ASTEROID computer platform are presented. This computer platform is intended to solve the complete set of equations of radiation gas dynamics of viscous, heat-conductive, and physically and chemically nonequilibrium gas, as well as radiation transfer. The spectral optical properties of high temperature gases are calculated using ab initio quasi-classical and quantum-mechanical methods. The calculation of the transfer of selective thermal radiation is performed using a line-by-line method using specially generated computational grids over the radiation wavelengths, which make it possible to attain a noticeable economy of computational resources.
ERIC Educational Resources Information Center
Abuzaghleh, Omar; Goldschmidt, Kathleen; Elleithy, Yasser; Lee, Jeongkyu
2013-01-01
With the advances in computing power, high-performance computing (HPC) platforms have had an impact on not only scientific research in advanced organizations but also computer science curriculum in the educational community. For example, multicore programming and parallel systems are highly desired courses in the computer science major. However,…
Butchosa, C; Simon, S; Blancafort, L; Voityuk, A
2012-07-12
Because hole transfer from nucleobases to amino acid residues in DNA-protein complexes can prevent oxidative damage of DNA in living cells, computational modeling of the process is of high interest. We performed MS-CASPT2 calculations of several model structures of π-stacked guanine and indole and derived electron-transfer (ET) parameters for these systems using the generalized Mulliken-Hush (GMH) method. We show that the two-state model commonly applied to treat thermal ET between adjacent donor and acceptor is of limited use for the considered systems because of the small gap between the ground and first excited states in the indole radical cation. The ET parameters obtained within the two-state GMH scheme can deviate significantly from the corresponding matrix elements of the two-state effective Hamiltonian based on the GMH treatment of three adiabatic states. The computed values of diabatic energies and electronic couplings provide benchmarks to assess the performance of less sophisticated computational methods.
InPRO: Automated Indoor Construction Progress Monitoring Using Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Hamledari, Hesam
In this research, an envisioned automated intelligent robotic solution for automated indoor data collection and inspection that employs a series of unmanned aerial vehicles (UAV), entitled "InPRO", is presented. InPRO consists of four stages, namely: 1) automated path planning; 2) autonomous UAV-based indoor inspection; 3) automated computer vision-based assessment of progress; and, 4) automated updating of 4D building information models (BIM). The works presented in this thesis address the third stage of InPRO. A series of computer vision-based methods that automate the assessment of construction progress using images captured at indoor sites are introduced. The proposed methods employ computer vision and machine learning techniques to detect the components of under-construction indoor partitions. In particular, framing (studs), insulation, electrical outlets, and different states of drywall sheets (installing, plastering, and painting) are automatically detected using digital images. High accuracy rates, real-time performance, and operation without a priori information are indicators of the methods' promising performance.
Tools for 3D scientific visualization in computational aerodynamics at NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
Hardware, software, and techniques used by the Fluid Dynamics Division (NASA) for performing visualization of computational aerodynamics, which can be applied to the visualization of flow fields from computer simulations of fluid dynamics about the Space Shuttle, are discussed. Three visualization techniques applied, post-processing, tracking, and steering, are described, as well as the post-processing software packages used, PLOT3D, SURF (Surface Modeller), GAS (Graphical Animation System), and FAST (Flow Analysis software Toolkit). Using post-processing methods a flow simulation was executed on a supercomputer and, after the simulation was complete, the results were processed for viewing. It is shown that the high-resolution, high-performance three-dimensional workstation combined with specially developed display and animation software provides a good tool for analyzing flow field solutions obtained from supercomputers.
NASA Astrophysics Data System (ADS)
Liu, Fei; Xu, Guanghua; Zhang, Qing; Liang, Lin; Liu, Dan
2015-11-01
As one of the Geometrical Product Specifications that are widely applied in industrial manufacturing and measurement, sphericity error can synthetically scale a 3D structure and reflects the machining quality of a spherical workpiece. Following increasing demands in the high motion performance of spherical parts, sphericity error is becoming an indispensable component in the evaluation of form error. However, the evaluation of sphericity error is still considered to be a complex mathematical issue, and the related research studies on the development of available models are lacking. In this paper, an intersecting chord method is first proposed to solve the minimum circumscribed sphere and maximum inscribed sphere evaluations of sphericity error. This new modelling method leverages chord relationships to replace the characteristic points, thereby significantly reducing the computational complexity and improving the computational efficiency. Using the intersecting chords to generate a virtual centre, the reference sphere in two concentric spheres is simplified as a space intersecting structure. The position of the virtual centre on the space intersecting structure is determined by characteristic chords, which may reduce the deviation between the virtual centre and the centre of the reference sphere. In addition,two experiments are used to verify the effectiveness of the proposed method with real datasets from the Cartesian coordinates. The results indicate that the estimated errors are in perfect agreement with those of the published methods. Meanwhile, the computational efficiency is improved. For the evaluation of the sphericity error, the use of high performance computing is a remarkable change.
NASA Technical Reports Server (NTRS)
Garai, Anirban; Diosady, Laslo T.; Murman, Scott M.; Madavan, Nateri K.
2016-01-01
Recent progress towards developing a new computational capability for accurate and efficient high-fidelity direct numerical simulation (DNS) and large-eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy- stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy, and is implemented in a computationally efficient manner on a modern high performance computer architecture. An inflow turbulence generation procedure based on a linear forcing approach has been incorporated in this framework and DNS conducted to study the effect of inflow turbulence on the suction- side separation bubble in low-pressure turbine (LPT) cascades. The T106 series of airfoil cascades in both lightly (T106A) and highly loaded (T106C) configurations at exit isentropic Reynolds numbers of 60,000 and 80,000, respectively, are considered. The numerical simulations are performed using 8th-order accurate spatial and 4th-order accurate temporal discretization. The changes in separation bubble topology due to elevated inflow turbulence is captured by the present method and the physical mechanisms leading to the changes are explained. The present results are in good agreement with prior numerical simulations but some expected discrepancies with the experimental data for the T106C case are noted and discussed.
Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun
2017-01-07
Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6 ± 15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size.
NASA Astrophysics Data System (ADS)
Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun
2017-01-01
Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6 ± 15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size.
Li, Yongbao; Tian, Zhen; Song, Ting; Wu, Zhaoxia; Liu, Yaqiang; Jiang, Steve; Jia, Xun
2016-01-01
Monte Carlo (MC)-based spot dose calculation is highly desired for inverse treatment planning in proton therapy because of its accuracy. Recent studies on biological optimization have also indicated the use of MC methods to compute relevant quantities of interest, e.g. linear energy transfer. Although GPU-based MC engines have been developed to address inverse optimization problems, their efficiency still needs to be improved. Also, the use of a large number of GPUs in MC calculation is not favorable for clinical applications. The previously proposed adaptive particle sampling (APS) method can improve the efficiency of MC-based inverse optimization by using the computationally expensive MC simulation more effectively. This method is more efficient than the conventional approach that performs spot dose calculation and optimization in two sequential steps. In this paper, we propose a computational library to perform MC-based spot dose calculation on GPU with the APS scheme. The implemented APS method performs a non-uniform sampling of the particles from pencil beam spots during the optimization process, favoring those from the high intensity spots. The library also conducts two computationally intensive matrix-vector operations frequently used when solving an optimization problem. This library design allows a streamlined integration of the MC-based spot dose calculation into an existing proton therapy inverse planning process. We tested the developed library in a typical inverse optimization system with four patient cases. The library achieved the targeted functions by supporting inverse planning in various proton therapy schemes, e.g. single field uniform dose, 3D intensity modulated proton therapy, and distal edge tracking. The efficiency was 41.6±15.3% higher than the use of a GPU-based MC package in a conventional calculation scheme. The total computation time ranged between 2 and 50 min on a single GPU card depending on the problem size. PMID:27991456
Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S
2018-06-01
Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.
Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng
2018-01-01
Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754
Numerical Simulations Of High-Altitude Aerothermodynamics Of A Prospective Spacecraft Model
NASA Astrophysics Data System (ADS)
Vashchenkov, P. V.; Kaskovsky, A. V.; Krylov, A. N.; Ivanov, M. S.
2011-05-01
The paper describes the computations of aerothermodynamic characteristics of a promising spacecraft (Prospective Piloted Transport System) along its de- scent trajectory at altitudes from 120 to 60 km. The computations are performed by the DSMC method with the use of the SMILE software system and by the engineering technique (local bridging method) with the use of the RuSat software system. The influence of real gas effects (excitation of rotational and vibrational energy modes and chemical reactions) on aerothermodynamic characteristics of the vehicle is studied. A comparison of results obtained by the approximate engineering method and the DSMC method allow the accuracy of prediction of aerodynamic characteristics by the local bridging method to be estimated.
Yang, Yiqun; Urban, Matthew W; McGough, Robert J
2018-05-15
Shear wave calculations induced by an acoustic radiation force are very time-consuming on desktop computers, and high-performance graphics processing units (GPUs) achieve dramatic reductions in the computation time for these simulations. The acoustic radiation force is calculated using the fast near field method and the angular spectrum approach, and then the shear waves are calculated in parallel with Green's functions on a GPU. This combination enables rapid evaluation of shear waves for push beams with different spatial samplings and for apertures with different f/#. Relative to shear wave simulations that evaluate the same algorithm on an Intel i7 desktop computer, a high performance nVidia GPU reduces the time required for these calculations by a factor of 45 and 700 when applied to elastic and viscoelastic shear wave simulation models, respectively. These GPU-accelerated simulations also compared to measurements in different viscoelastic phantoms, and the results are similar. For parametric evaluations and for comparisons with measured shear wave data, shear wave simulations with the Green's function approach are ideally suited for high-performance GPUs.
We used chromatography modeling software to assist in HPLC method development, with the goal
of enhancing separations through the exclusive use of gradient time and column temperature. We
surveyed nine stationary phases for their utility in pigment purification and natur...
Novel method for measuring a dense 3D strain map of robotic flapping wings
NASA Astrophysics Data System (ADS)
Li, Beiwen; Zhang, Song
2018-04-01
Measuring dense 3D strain maps of the inextensible membranous flapping wings of robots is of vital importance to the field of bio-inspired engineering. Conventional high-speed 3D videography methods typically reconstruct the wing geometries through measuring sparse points with fiducial markers, and thus cannot obtain the full-field mechanics of the wings in detail. In this research, we propose a novel system to measure a dense strain map of inextensible membranous flapping wings by developing a superfast 3D imaging system and a computational framework for strain analysis. Specifically, first we developed a 5000 Hz 3D imaging system based on the digital fringe projection technique using the defocused binary patterns to precisely measure the dynamic 3D geometries of rapidly flapping wings. Then, we developed a geometry-based algorithm to perform point tracking on the precisely measured 3D surface data. Finally, we developed a dense strain computational method using the Kirchhoff-Love shell theory. Experiments demonstrate that our method can effectively perform point tracking and measure a highly dense strain map of the wings without many fiducial markers.
Computational fluid mechanics utilizing the variational principle of modeling damping seals
NASA Technical Reports Server (NTRS)
Abernathy, J. M.
1986-01-01
A computational fluid dynamics code for application to traditional incompressible flow problems has been developed. The method is actually a slight compressibility approach which takes advantage of the bulk modulus and finite sound speed of all real fluids. The finite element numerical analog uses a dynamic differencing scheme based, in part, on a variational principle for computational fluid dynamics. The code was developed in order to study the feasibility of damping seals for high speed turbomachinery. Preliminary seal analyses have been performed.
NASA Astrophysics Data System (ADS)
Park, Jonghee; Yoon, Kuk-Jin
2015-02-01
We propose a real-time line matching method for stereo systems. To achieve real-time performance while retaining a high level of matching precision, we first propose a nonparametric transform to represent the spatial relations between neighboring lines and nearby textures as a binary stream. Since the length of a line can vary across images, the matching costs between lines are computed within an overlap area (OA) based on the binary stream. The OA is determined for each line pair by employing the properties of a rectified image pair. Finally, the line correspondence is determined using a winner-takes-all method with a left-right consistency check. To reduce the computational time requirements further, we filter out unreliable matching candidates in advance based on their rectification properties. The performance of the proposed method was compared with state-of-the-art methods in terms of the computational time, matching precision, and recall. The proposed method required 47 ms to match lines from an image pair in the KITTI dataset with an average precision of 95%. We also verified the proposed method under image blur, illumination variation, and viewpoint changes.
NASA Astrophysics Data System (ADS)
Fan, Xiao-Ning; Zhi, Bo
2017-07-01
Uncertainties in parameters such as materials, loading, and geometry are inevitable in designing metallic structures for cranes. When considering these uncertainty factors, reliability-based design optimization (RBDO) offers a more reasonable design approach. However, existing RBDO methods for crane metallic structures are prone to low convergence speed and high computational cost. A unilevel RBDO method, combining a discrete imperialist competitive algorithm with an inverse reliability strategy based on the performance measure approach, is developed. Application of the imperialist competitive algorithm at the optimization level significantly improves the convergence speed of this RBDO method. At the reliability analysis level, the inverse reliability strategy is used to determine the feasibility of each probabilistic constraint at each design point by calculating its α-percentile performance, thereby avoiding convergence failure, calculation error, and disproportionate computational effort encountered using conventional moment and simulation methods. Application of the RBDO method to an actual crane structure shows that the developed RBDO realizes a design with the best tradeoff between economy and safety together with about one-third of the convergence speed and the computational cost of the existing method. This paper provides a scientific and effective design approach for the design of metallic structures of cranes.
Evaluation of the long-term performance of six alternative disposal methods for LLRW
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kossik, R.; Sharp, G.; Chau, T.
1995-12-31
The State of New York has carried out a comparison of six alternative disposal methods for low-level radioactive waste (LLRW). An important part of these evaluations involved quantitatively analyzing the long-term (10,000 yr) performance of the methods with respect to dose to humans, radionuclide concentrations in the environment, and cumulative release from the facility. Four near-surface methods (covered above-grade vault, uncovered above-grade vault, below-grade vault, augered holes) and two mine methods (vertical shaft mine and drift mine) were evaluated. Each method was analyzed for several generic site conditions applicable for the state. The evaluations were carried out using RIP (Repositorymore » Integration Program), an integrated, total system performance assessment computer code which has been applied to radioactive waste disposal facilities both in the U.S. (Yucca Mountain, WIPP) and worldwide. The evaluations indicate that mines in intact low-permeability rock and near-surface facilities with engineered covers generally have a high potential to perform well (within regulatory limits). Uncovered above-grade vaults and mines in highly fractured crystalline rock, however, have a high potential to perform poorly, exceeding regulatory limits.« less
A world-wide databridge supported by a commercial cloud provider
NASA Astrophysics Data System (ADS)
Tat Cheung, Kwong; Field, Laurence; Furano, Fabrizio
2017-10-01
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources. However, high energy physics applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects. While the so-called databridge has already been successfully proposed as a method to span the untrusted and trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing residential networks and CERN could be unreliable, leading to wasted resources usage. The expectation is that by placing a storage endpoint that is part of a wider, flexible geographical databridge deployment closer to the volunteers, the transfer success rate and the overall performance can be improved. This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.
Schwenke, Michael; Georgii, Joachim; Preusser, Tobias
2017-07-01
Focused ultrasound (FUS) is rapidly gaining clinical acceptance for several target tissues in the human body. Yet, treating liver targets is not clinically applied due to a high complexity of the procedure (noninvasiveness, target motion, complex anatomy, blood cooling effects, shielding by ribs, and limited image-based monitoring). To reduce the complexity, numerical FUS simulations can be utilized for both treatment planning and execution. These use-cases demand highly accurate and computationally efficient simulations. We propose a numerical method for the simulation of abdominal FUS treatments during respiratory motion of the organs and target. Especially, a novel approach is proposed to simulate the heating during motion by solving Pennes' bioheat equation in a computational reference space, i.e., the equation is mathematically transformed to the reference. The approach allows for motion discontinuities, e.g., the sliding of the liver along the abdominal wall. Implementing the solver completely on the graphics processing unit and combining it with an atlas-based ultrasound simulation approach yields a simulation performance faster than real time (less than 50-s computing time for 100 s of treatment time) on a modern off-the-shelf laptop. The simulation method is incorporated into a treatment planning demonstration application that allows to simulate real patient cases including respiratory motion. The high performance of the presented simulation method opens the door to clinical applications. The methods bear the potential to enable the application of FUS for moving organs.
NASA Technical Reports Server (NTRS)
Gibson, Jim; Jordan, Joe; Grant, Terry
1990-01-01
Local Area Network Extensible Simulator (LANES) computer program provides method for simulating performance of high-speed local-area-network (LAN) technology. Developed as design and analysis software tool for networking computers on board proposed Space Station. Load, network, link, and physical layers of layered network architecture all modeled. Mathematically models according to different lower-layer protocols: Fiber Distributed Data Interface (FDDI) and Star*Bus. Written in FORTRAN 77.
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower computational complexity when compared with a super-resolution method based on convolutional neural nets (SRCNN15). Compared with the previous SI method that is limited with a scale factor of 2, GLM-SI shows superior performance with average 0.79 dB higher in PSNR, and can be used for scale factors of 3 or higher.
NASA Technical Reports Server (NTRS)
Saleeb, A. F.; Chang, T. Y. P.; Wilt, T.; Iskovitz, I.
1989-01-01
The research work performed during the past year on finite element implementation and computational techniques pertaining to high temperature composites is outlined. In the present research, two main issues are addressed: efficient geometric modeling of composite structures and expedient numerical integration techniques dealing with constitutive rate equations. In the first issue, mixed finite elements for modeling laminated plates and shells were examined in terms of numerical accuracy, locking property and computational efficiency. Element applications include (currently available) linearly elastic analysis and future extension to material nonlinearity for damage predictions and large deformations. On the material level, various integration methods to integrate nonlinear constitutive rate equations for finite element implementation were studied. These include explicit, implicit and automatic subincrementing schemes. In all cases, examples are included to illustrate the numerical characteristics of various methods that were considered.
NASA Astrophysics Data System (ADS)
Kruis, Nathanael J. F.
Heat transfer from building foundations varies significantly in all three spatial dimensions and has important dynamic effects at all timescales, from one hour to several years. With the additional consideration of moisture transport, ground freezing, evapotranspiration, and other physical phenomena, the estimation of foundation heat transfer becomes increasingly sophisticated and computationally intensive to the point where accuracy must be compromised for reasonable computation time. The tools currently available to calculate foundation heat transfer are often either too limited in their capabilities to draw meaningful conclusions or too sophisticated to use in common practices. This work presents Kiva, a new foundation heat transfer computational framework. Kiva provides a flexible environment for testing different numerical schemes, initialization methods, spatial and temporal discretizations, and geometric approximations. Comparisons within this framework provide insight into the balance of computation speed and accuracy relative to highly detailed reference solutions. The accuracy and computational performance of six finite difference numerical schemes are verified against established IEA BESTEST test cases for slab-on-grade heat conduction. Of the schemes tested, the Alternating Direction Implicit (ADI) scheme demonstrates the best balance between accuracy, performance, and numerical stability. Kiva features four approaches of initializing soil temperatures for an annual simulation. A new accelerated initialization approach is shown to significantly reduce the required years of presimulation. Methods of approximating three-dimensional heat transfer within a representative two-dimensional context further improve computational performance. A new approximation called the boundary layer adjustment method is shown to improve accuracy over other established methods with a negligible increase in computation time. This method accounts for the reduced heat transfer from concave foundation shapes, which has not been adequately addressed to date. Within the Kiva framework, three-dimensional heat transfer that can require several days to simulate is approximated in two-dimensions in a matter of seconds while maintaining a mean absolute deviation within 3%.
NASA Technical Reports Server (NTRS)
DeBonis, James R.
2013-01-01
A computational fluid dynamics code that solves the compressible Navier-Stokes equations was applied to the Taylor-Green vortex problem to examine the code s ability to accurately simulate the vortex decay and subsequent turbulence. The code, WRLES (Wave Resolving Large-Eddy Simulation), uses explicit central-differencing to compute the spatial derivatives and explicit Low Dispersion Runge-Kutta methods for the temporal discretization. The flow was first studied and characterized using Bogey & Bailley s 13-point dispersion relation preserving (DRP) scheme. The kinetic energy dissipation rate, computed both directly and from the enstrophy field, vorticity contours, and the energy spectra are examined. Results are in excellent agreement with a reference solution obtained using a spectral method and provide insight into computations of turbulent flows. In addition the following studies were performed: a comparison of 4th-, 8th-, 12th- and DRP spatial differencing schemes, the effect of the solution filtering on the results, the effect of large-eddy simulation sub-grid scale models, and the effect of high-order discretization of the viscous terms.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Senate Committee on Commerce, Science, and Transportation.
This report discusses Senate Bill no. 272, which provides for a coordinated federal research and development program to ensure continued U.S. leadership in high-performance computing. High performance computing is defined as representing the leading edge of technological advancement in computing, i.e., the most sophisticated computer chips, the…
Oulas, Anastasis; Karathanasis, Nestoras; Louloupi, Annita; Pavlopoulos, Georgios A; Poirazi, Panayiota; Kalantidis, Kriton; Iliopoulos, Ioannis
2015-01-01
Computational methods for miRNA target prediction are currently undergoing extensive review and evaluation. There is still a great need for improvement of these tools and bioinformatics approaches are looking towards high-throughput experiments in order to validate predictions. The combination of large-scale techniques with computational tools will not only provide greater credence to computational predictions but also lead to the better understanding of specific biological questions. Current miRNA target prediction tools utilize probabilistic learning algorithms, machine learning methods and even empirical biologically defined rules in order to build models based on experimentally verified miRNA targets. Large-scale protein downregulation assays and next-generation sequencing (NGS) are now being used to validate methodologies and compare the performance of existing tools. Tools that exhibit greater correlation between computational predictions and protein downregulation or RNA downregulation are considered the state of the art. Moreover, efficiency in prediction of miRNA targets that are concurrently verified experimentally provides additional validity to computational predictions and further highlights the competitive advantage of specific tools and their efficacy in extracting biologically significant results. In this review paper, we discuss the computational methods for miRNA target prediction and provide a detailed comparison of methodologies and features utilized by each specific tool. Moreover, we provide an overview of current state-of-the-art high-throughput methods used in miRNA target prediction.
Comptational Design Of Functional CA-S-H and Oxide Doped Alloy Systems
NASA Astrophysics Data System (ADS)
Yang, Shizhong; Chilla, Lokeshwar; Yang, Yan; Li, Kuo; Wicker, Scott; Zhao, Guang-Lin; Khosravi, Ebrahim; Bai, Shuju; Zhang, Boliang; Guo, Shengmin
Computer aided functional materials design accelerates the discovery of novel materials. This presentation will cover our recent research advance on the Ca-S-H system properties prediction and oxide doped high entropy alloy property simulation and experiment validation. Several recent developed computational materials design methods were utilized to the two systems physical and chemical properties prediction. A comparison of simulation results to the corresponding experiment data will be introduced. This research is partially supported by NSF CIMM project (OIA-15410795 and the Louisiana BoR), NSF HBCU Supplement climate change and ecosystem sustainability subproject 3, and LONI high performance computing time allocation loni mat bio7.
Modeling a Wireless Network for International Space Station
NASA Technical Reports Server (NTRS)
Alena, Richard; Yaprak, Ece; Lamouri, Saad
2000-01-01
This paper describes the application of wireless local area network (LAN) simulation modeling methods to the hybrid LAN architecture designed for supporting crew-computing tools aboard the International Space Station (ISS). These crew-computing tools, such as wearable computers and portable advisory systems, will provide crew members with real-time vehicle and payload status information and access to digital technical and scientific libraries, significantly enhancing human capabilities in space. A wireless network, therefore, will provide wearable computer and remote instruments with the high performance computational power needed by next-generation 'intelligent' software applications. Wireless network performance in such simulated environments is characterized by the sustainable throughput of data under different traffic conditions. This data will be used to help plan the addition of more access points supporting new modules and more nodes for increased network capacity as the ISS grows.
Tempest - Efficient Computation of Atmospheric Flows Using High-Order Local Discretization Methods
NASA Astrophysics Data System (ADS)
Ullrich, P. A.; Guerra, J. E.
2014-12-01
The Tempest Framework composes several compact numerical methods to easily facilitate intercomparison of atmospheric flow calculations on the sphere and in rectangular domains. This framework includes the implementations of Spectral Elements, Discontinuous Galerkin, Flux Reconstruction, and Hybrid Finite Element methods with the goal of achieving optimal accuracy in the solution of atmospheric problems. Several advantages of this approach are discussed such as: improved pressure gradient calculation, numerical stability by vertical/horizontal splitting, arbitrary order of accuracy, etc. The local numerical discretization allows for high performance parallel computation and efficient inclusion of parameterizations. These techniques are used in conjunction with a non-conformal, locally refined, cubed-sphere grid for global simulations and standard Cartesian grids for simulations at the mesoscale. A complete implementation of the methods described is demonstrated in a non-hydrostatic setting.
New technology in turbine aerodynamics
NASA Technical Reports Server (NTRS)
Glassman, A. J.; Moffitt, T. P.
1972-01-01
A cursory review is presented of some of the recent work that has been done in turbine aerodynamic research at NASA-Lewis Research Center. Topics discussed include the aerodynamic effect of turbine coolant, high work-factor (ratio of stage work to square of blade speed) turbines, and computer methods for turbine design and performance prediction. An extensive bibliography is included. Experimental cooled-turbine aerodynamics programs using two-dimensional cascades, full annular cascades, and cold rotating turbine stage tests are discussed with some typical results presented. Analytically predicted results for cooled blade performance are compared to experimental results. The problems and some of the current programs associated with the use of very high work factors for fan-drive turbines of high-bypass-ratio engines are discussed. Turbines currently being investigated make use of advanced blading concepts designed to maintain high efficiency under conditions of high aerodynamic loading. Computer programs have been developed for turbine design-point performance, off-design performance, supersonic blade profile design, and the calculation of channel velocities for subsonic and transonic flow fields. The use of these programs for the design and analysis of axial and radial turbines is discussed.
NASA Astrophysics Data System (ADS)
Chen, Xiaogang; Wang, Yijun; Gao, Shangkai; Jung, Tzyy-Ping; Gao, Xiaorong
2015-08-01
Objective. Recently, canonical correlation analysis (CCA) has been widely used in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) due to its high efficiency, robustness, and simple implementation. However, a method with which to make use of harmonic SSVEP components to enhance the CCA-based frequency detection has not been well established. Approach. This study proposed a filter bank canonical correlation analysis (FBCCA) method to incorporate fundamental and harmonic frequency components to improve the detection of SSVEPs. A 40-target BCI speller based on frequency coding (frequency range: 8-15.8 Hz, frequency interval: 0.2 Hz) was used for performance evaluation. To optimize the filter bank design, three methods (M1: sub-bands with equally spaced bandwidths; M2: sub-bands corresponding to individual harmonic frequency bands; M3: sub-bands covering multiple harmonic frequency bands) were proposed for comparison. Classification accuracy and information transfer rate (ITR) of the three FBCCA methods and the standard CCA method were estimated using an offline dataset from 12 subjects. Furthermore, an online BCI speller adopting the optimal FBCCA method was tested with a group of 10 subjects. Main results. The FBCCA methods significantly outperformed the standard CCA method. The method M3 achieved the highest classification performance. At a spelling rate of ˜33.3 characters/min, the online BCI speller obtained an average ITR of 151.18 ± 20.34 bits min-1. Significance. By incorporating the fundamental and harmonic SSVEP components in target identification, the proposed FBCCA method significantly improves the performance of the SSVEP-based BCI, and thereby facilitates its practical applications such as high-speed spelling.
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.
Guo, Xuan; Meng, Yu; Yu, Ning; Pan, Yi
2014-04-10
Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering
2014-01-01
Backgroud Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS. PMID:24717145
Accelerating Time Integration for the Shallow Water Equations on the Sphere Using GPUs
Archibald, R.; Evans, K. J.; Salinger, A.
2015-06-01
The push towards larger and larger computational platforms has made it possible for climate simulations to resolve climate dynamics across multiple spatial and temporal scales. This direction in climate simulation has created a strong need to develop scalable timestepping methods capable of accelerating throughput on high performance computing. This study details the recent advances in the implementation of implicit time stepping of the spectral element dynamical core within the United States Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) on graphical processing units (GPU) based machines. We demonstrate how solvers in the Trilinos project are interfaced with ACMEmore » and GPU kernels to increase computational speed of the residual calculations in the implicit time stepping method for the atmosphere dynamics. We demonstrate the optimization gains and data structure reorganization that facilitates the performance improvements.« less
2012-01-01
atmosphere model, Int. J . High Perform. Comput. Appl. 26 (1) (2012) 74–89. [8] J.M. Dennis, M. Levy, R.D. Nair, H.M. Tufo, T . Voran. Towards and efficient...26] A. Klockner, T . Warburton, J . Bridge, J.S, Hesthaven, Nodal discontinuous galerkin methods on graphics processors, J . Comput. Phys. 228 (21) (2009...mode James F. Kelly, Francis X. Giraldo ⇑ Department of Applied Mathematics, Naval Postgraduate School, Monterey, CA, United States a r t i c l e i n
Coronary artery segmentation in X-ray angiograms using gabor filters and differential evolution.
Cervantes-Sanchez, Fernando; Cruz-Aceves, Ivan; Hernandez-Aguirre, Arturo; Solorio-Meza, Sergio; Cordova-Fraga, Teodoro; Aviña-Cervantes, Juan Gabriel
2018-08-01
Segmentation of coronary arteries in X-ray angiograms represents an essential task for computer-aided diagnosis, since it can help cardiologists in diagnosing and monitoring vascular abnormalities. Due to the main disadvantages of the X-ray angiograms are the nonuniform illumination, and the weak contrast between blood vessels and image background, different vessel enhancement methods have been introduced. In this paper, a novel method for blood vessel enhancement based on Gabor filters tuned using the optimization strategy of Differential evolution (DE) is proposed. Because the Gabor filters are governed by three different parameters, the optimal selection of those parameters is highly desirable in order to maximize the vessel detection rate while reducing the computational cost of the training stage. To obtain the optimal set of parameters for the Gabor filters, the area (Az) under the receiver operating characteristics curve is used as objective function. In the experimental results, the proposed method achieves an A z =0.9388 in a training set of 40 images, and for a test set of 40 images it obtains the highest performance with an A z =0.9538 compared with six state-of-the-art vessel detection methods. Finally, the proposed method achieves an accuracy of 0.9423 for vessel segmentation using the test set. In addition, the experimental results have also shown that the proposed method can be highly suitable for clinical decision support in terms of computational time and vessel segmentation performance. Copyright © 2017 Elsevier Ltd. All rights reserved.
An efficient implementation of a high-order filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-03-01
A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
Xu, Gang; Liang, Xifeng; Yao, Shuanbao; Chen, Dawei; Li, Zhiwei
2017-01-01
Minimizing the aerodynamic drag and the lift of the train coach remains a key issue for high-speed trains. With the development of computing technology and computational fluid dynamics (CFD) in the engineering field, CFD has been successfully applied to the design process of high-speed trains. However, developing a new streamlined shape for high-speed trains with excellent aerodynamic performance requires huge computational costs. Furthermore, relationships between multiple design variables and the aerodynamic loads are seldom obtained. In the present study, the Kriging surrogate model is used to perform a multi-objective optimization of the streamlined shape of high-speed trains, where the drag and the lift of the train coach are the optimization objectives. To improve the prediction accuracy of the Kriging model, the cross-validation method is used to construct the optimal Kriging model. The optimization results show that the two objectives are efficiently optimized, indicating that the optimization strategy used in the present study can greatly improve the optimization efficiency and meet the engineering requirements.
TADSim: Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics
Mniszewski, Susan M.; Junghans, Christoph; Voter, Arthur F.; ...
2015-04-16
Next-generation high-performance computing will require more scalable and flexible performance prediction tools to evaluate software--hardware co-design choices relevant to scientific applications and hardware architectures. Here, we present a new class of tools called application simulators—parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation. Parameterized choices for the algorithmic method and hardware options provide a rich space for design exploration and allow us to quickly find well-performing software--hardware combinations. We demonstrate our approach with a TADSim simulator that models the temperature-accelerated dynamics (TAD) method, an algorithmically complex and parameter-rich member of the accelerated molecular dynamics (AMD) family ofmore » molecular dynamics methods. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We accomplish this by identifying the time-intensive elements, quantifying algorithm steps in terms of those elements, abstracting them out, and replacing them by the passage of time. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We extend TADSim to model algorithm extensions, such as speculative spawning of the compute-bound stages, and predict performance improvements without having to implement such a method. Validation against the actual TAD code shows close agreement for the evolution of an example physical system, a silver surface. Finally, focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights and suggested extensions.« less
Wilson-Sands, Cathy; Brahn, Pamela; Graves, Kristal
2015-01-01
Validating participants' ability to correctly perform cardiopulmonary resuscitation (CPR) skills during basic life support courses can be a challenge for nursing professional development specialists. This study compares two methods of basic life support training, instructor-led and computer-based learning with voice-activated manikins, to identify if one method is more effective for performance of CPR skills. The findings suggest that a computer-based learning course with voice-activated manikins is a more effective method of training for improved CPR performance.
Building high-performance system for processing a daily large volume of Chinese satellites imagery
NASA Astrophysics Data System (ADS)
Deng, Huawu; Huang, Shicun; Wang, Qi; Pan, Zhiqiang; Xin, Yubin
2014-10-01
The number of Earth observation satellites from China increases dramatically recently and those satellites are acquiring a large volume of imagery daily. As the main portal of image processing and distribution from those Chinese satellites, the China Centre for Resources Satellite Data and Application (CRESDA) has been working with PCI Geomatics during the last three years to solve two issues in this regard: processing the large volume of data (about 1,500 scenes or 1 TB per day) in a timely manner and generating geometrically accurate orthorectified products. After three-year research and development, a high performance system has been built and successfully delivered. The high performance system has a service oriented architecture and can be deployed to a cluster of computers that may be configured with high end computing power. The high performance is gained through, first, making image processing algorithms into parallel computing by using high performance graphic processing unit (GPU) cards and multiple cores from multiple CPUs, and, second, distributing processing tasks to a cluster of computing nodes. While achieving up to thirty (and even more) times faster in performance compared with the traditional practice, a particular methodology was developed to improve the geometric accuracy of images acquired from Chinese satellites (including HJ-1 A/B, ZY-1-02C, ZY-3, GF-1, etc.). The methodology consists of fully automatic collection of dense ground control points (GCP) from various resources and then application of those points to improve the photogrammetric model of the images. The delivered system is up running at CRESDA for pre-operational production and has been and is generating good return on investment by eliminating a great amount of manual labor and increasing more than ten times of data throughput daily with fewer operators. Future work, such as development of more performance-optimized algorithms, robust image matching methods and application workflows, is identified to improve the system in the coming years.
Distributed collaborative response surface method for mechanical dynamic assembly reliability design
NASA Astrophysics Data System (ADS)
Bai, Guangchen; Fei, Chengwei
2013-11-01
Because of the randomness of many impact factors influencing the dynamic assembly relationship of complex machinery, the reliability analysis of dynamic assembly relationship needs to be accomplished considering the randomness from a probabilistic perspective. To improve the accuracy and efficiency of dynamic assembly relationship reliability analysis, the mechanical dynamic assembly reliability(MDAR) theory and a distributed collaborative response surface method(DCRSM) are proposed. The mathematic model of DCRSM is established based on the quadratic response surface function, and verified by the assembly relationship reliability analysis of aeroengine high pressure turbine(HPT) blade-tip radial running clearance(BTRRC). Through the comparison of the DCRSM, traditional response surface method(RSM) and Monte Carlo Method(MCM), the results show that the DCRSM is not able to accomplish the computational task which is impossible for the other methods when the number of simulation is more than 100 000 times, but also the computational precision for the DCRSM is basically consistent with the MCM and improved by 0.40˜4.63% to the RSM, furthermore, the computational efficiency of DCRSM is up to about 188 times of the MCM and 55 times of the RSM under 10000 times simulations. The DCRSM is demonstrated to be a feasible and effective approach for markedly improving the computational efficiency and accuracy of MDAR analysis. Thus, the proposed research provides the promising theory and method for the MDAR design and optimization, and opens a novel research direction of probabilistic analysis for developing the high-performance and high-reliability of aeroengine.
Zhao, Anbang; Ma, Lin; Ma, Xuefei; Hui, Juan
2017-01-01
In this paper, an improved azimuth angle estimation method with a single acoustic vector sensor (AVS) is proposed based on matched filtering theory. The proposed method is mainly applied in an active sonar detection system. According to the conventional passive method based on complex acoustic intensity measurement, the mathematical and physical model of this proposed method is described in detail. The computer simulation and lake experiments results indicate that this method can realize the azimuth angle estimation with high precision by using only a single AVS. Compared with the conventional method, the proposed method achieves better estimation performance. Moreover, the proposed method does not require complex operations in frequency-domain and achieves computational complexity reduction. PMID:28230763
NASA Astrophysics Data System (ADS)
Manabe, Yoshitsugu; Imura, Masataka; Tsuchiya, Masanobu; Yasumuro, Yoshihiro; Chihara, Kunihiro
2003-01-01
Wearable 3D measurement realizes to acquire 3D information of an objects or an environment using a wearable computer. Recently, we can send voice and sound as well as pictures by mobile phone in Japan. Moreover it will become easy to capture and send data of short movie by it. On the other hand, the computers become compact and high performance. And it can easy connect to Internet by wireless LAN. Near future, we can use the wearable computer always and everywhere. So we will be able to send the three-dimensional data that is measured by wearable computer as a next new data. This paper proposes the measurement method and system of three-dimensional data of an object with the using of wearable computer. This method uses slit light projection for 3D measurement and user"s motion instead of scanning system.
Längkvist, Martin; Jendeberg, Johan; Thunberg, Per; Loutfi, Amy; Lidén, Mats
2018-06-01
Computed tomography (CT) is the method of choice for diagnosing ureteral stones - kidney stones that obstruct the ureter. The purpose of this study is to develop a computer aided detection (CAD) algorithm for identifying a ureteral stone in thin slice CT volumes. The challenge in CAD for urinary stones lies in the similarity in shape and intensity of stones with non-stone structures and how to efficiently deal with large high-resolution CT volumes. We address these challenges by using a Convolutional Neural Network (CNN) that works directly on the high resolution CT volumes. The method is evaluated on a large data base of 465 clinically acquired high-resolution CT volumes of the urinary tract with labeling of ureteral stones performed by a radiologist. The best model using 2.5D input data and anatomical information achieved a sensitivity of 100% and an average of 2.68 false-positives per patient on a test set of 88 scans. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
A large-scale evaluation of computational protein function prediction
Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo
2013-01-01
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650
NASA Technical Reports Server (NTRS)
Gordon, Sanford; Zeleznik, Frank J.; Huff, Vearl N.
1959-01-01
A general computer program for chemical equilibrium and rocket performance calculations was written for the IBM 650 computer with 2000 words of drum storage, 60 words of high-speed core storage, indexing registers, and floating point attachments. The program is capable of carrying out combustion and isentropic expansion calculations on a chemical system that may include as many as 10 different chemical elements, 30 reaction products, and 25 pressure ratios. In addition to the equilibrium composition, temperature, and pressure, the program calculates specific impulse, specific impulse in vacuum, characteristic velocity, thrust coefficient, area ratio, molecular weight, Mach number, specific heat, isentropic exponent, enthalpy, entropy, and several thermodynamic first derivatives.
NASA Technical Reports Server (NTRS)
Ingels, F.; Schoggen, W. O.
1981-01-01
The various methods of high bit transition density encoding are presented, their relative performance is compared in so far as error propagation characteristics, transition properties and system constraints are concerned. A computer simulation of the system using the specific PN code recommended, is included.
NASA Astrophysics Data System (ADS)
Martin, Roland; Chevrot, Sébastien; Komatitsch, Dimitri; Seoane, Lucia; Spangenberg, Hannah; Wang, Yi; Dufréchou, Grégory; Bonvalot, Sylvain; Bruinsma, Sean
2017-04-01
We image the internal density structure of the Pyrenees by inverting gravity data using an a priori density model derived by scaling a Vp model obtained by full waveform inversion of teleseismic P-waves. Gravity anomalies are computed via a 3-D high-order finite-element integration in the same high-order spectral-element grid as the one used to solve the wave equation and thus to obtain the velocity model. The curvature of the Earth and surface topography are taken into account in order to obtain a density model as accurate as possible. The method is validated through comparisons with exact semi-analytical solutions. We show that the spectral-element method drastically accelerates the computations when compared to other more classical methods. Different scaling relations between compressional velocity and density are tested, and the Nafe-Drake relation is the one that leads to the best agreement between computed and observed gravity anomalies. Gravity data inversion is then performed and the results allow us to put more constraints on the density structure of the shallow crust and on the deep architecture of the mountain range.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
15 CFR 743.2 - High performance computers: Post shipment verification reporting.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false High performance computers: Post... Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT ADMINISTRATION REGULATIONS SPECIAL REPORTING § 743.2 High performance computers: Post shipment verification...
15 CFR 743.2 - High performance computers: Post shipment verification reporting.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false High performance computers: Post... Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT ADMINISTRATION REGULATIONS SPECIAL REPORTING § 743.2 High performance computers: Post shipment verification...
15 CFR 743.2 - High performance computers: Post shipment verification reporting.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false High performance computers: Post... Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT ADMINISTRATION REGULATIONS SPECIAL REPORTING § 743.2 High performance computers: Post shipment verification...
15 CFR 743.2 - High performance computers: Post shipment verification reporting.
Code of Federal Regulations, 2013 CFR
2013-01-01
... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false High performance computers: Post... Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT ADMINISTRATION REGULATIONS SPECIAL REPORTING § 743.2 High performance computers: Post shipment verification...
Predicting "Hot" and "Warm" Spots for Fragment Binding.
Rathi, Prakash Chandra; Ludlow, R Frederick; Hall, Richard J; Murray, Christopher W; Mortenson, Paul N; Verdonk, Marcel L
2017-05-11
Computational fragment mapping methods aim to predict hotspots on protein surfaces where small fragments will bind. Such methods are popular for druggability assessment as well as structure-based design. However, to date researchers developing or using such tools have had no clear way of assessing the performance of these methods. Here, we introduce the first diverse, high quality validation set for computational fragment mapping. The set contains 52 diverse examples of fragment binding "hot" and "warm" spots from the Protein Data Bank (PDB). Additionally, we describe PLImap, a novel protocol for fragment mapping based on the Protein-Ligand Informatics force field (PLIff). We evaluate PLImap against the new fragment mapping test set, and compare its performance to that of simple shape-based algorithms and fragment docking using GOLD. PLImap is made publicly available from https://bitbucket.org/AstexUK/pli .
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-11-01
The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox
NASA Astrophysics Data System (ADS)
Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
FastMag: Fast micromagnetic simulator for complex magnetic structures (invited)
NASA Astrophysics Data System (ADS)
Chang, R.; Li, S.; Lubarda, M. V.; Livshitz, B.; Lomakin, V.
2011-04-01
A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus, discretized superposition integrals are computed using a nonuniform grid interpolation method, which evaluates the field from N sources at N collocated observers in O(N) operations. This approach allows handling objects of arbitrary shape, allows easily calculating of the field outside the magnetized domains, does not require solving a linear system of equations, and requires little memory. FastMag is implemented on GPUs with ?> GPU-central processing unit speed-ups of 2 orders of magnitude. Simulations are shown of a large array of magnetic dots and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.
Development of High-speed Visualization System of Hypocenter Data Using CUDA-based GPU computing
NASA Astrophysics Data System (ADS)
Kumagai, T.; Okubo, K.; Uchida, N.; Matsuzawa, T.; Kawada, N.; Takeuchi, N.
2014-12-01
After the Great East Japan Earthquake on March 11, 2011, intelligent visualization of seismic information is becoming important to understand the earthquake phenomena. On the other hand, to date, the quantity of seismic data becomes enormous as a progress of high accuracy observation network; we need to treat many parameters (e.g., positional information, origin time, magnitude, etc.) to efficiently display the seismic information. Therefore, high-speed processing of data and image information is necessary to handle enormous amounts of seismic data. Recently, GPU (Graphic Processing Unit) is used as an acceleration tool for data processing and calculation in various study fields. This movement is called GPGPU (General Purpose computing on GPUs). In the last few years the performance of GPU keeps on improving rapidly. GPU computing gives us the high-performance computing environment at a lower cost than before. Moreover, use of GPU has an advantage of visualization of processed data, because GPU is originally architecture for graphics processing. In the GPU computing, the processed data is always stored in the video memory. Therefore, we can directly write drawing information to the VRAM on the video card by combining CUDA and the graphics API. In this study, we employ CUDA and OpenGL and/or DirectX to realize full-GPU implementation. This method makes it possible to write drawing information to the VRAM on the video card without PCIe bus data transfer: It enables the high-speed processing of seismic data. The present study examines the GPU computing-based high-speed visualization and the feasibility for high-speed visualization system of hypocenter data.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data.
Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k -nearest neighbor graphs- ( K -NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data
Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity. PMID:29270197
Sled, Elizabeth A.; Sheehy, Lisa M.; Felson, David T.; Costigan, Patrick A.; Lam, Miu; Cooke, T. Derek V.
2010-01-01
The objective of the study was to evaluate the reliability of frontal plane lower limb alignment measures using a landmark-based method by (1) comparing inter- and intra-reader reliability between measurements of alignment obtained manually with those using a computer program, and (2) determining inter- and intra-reader reliability of computer-assisted alignment measures from full-limb radiographs. An established method for measuring alignment was used, involving selection of 10 femoral and tibial bone landmarks. 1) To compare manual and computer methods, we used digital images and matching paper copies of five alignment patterns simulating healthy and malaligned limbs drawn using AutoCAD. Seven readers were trained in each system. Paper copies were measured manually and repeat measurements were performed daily for 3 days, followed by a similar routine with the digital images using the computer. 2) To examine the reliability of computer-assisted measures from full-limb radiographs, 100 images (200 limbs) were selected as a random sample from 1,500 full-limb digital radiographs which were part of the Multicenter Osteoarthritis (MOST) Study. Three trained readers used the software program to measure alignment twice from the batch of 100 images, with two or more weeks between batch handling. Manual and computer measures of alignment showed excellent agreement (intraclass correlations [ICCs] 0.977 – 0.999 for computer analysis; 0.820 – 0.995 for manual measures). The computer program applied to full-limb radiographs produced alignment measurements with high inter- and intra-reader reliability (ICCs 0.839 – 0.998). In conclusion, alignment measures using a bone landmark-based approach and a computer program were highly reliable between multiple readers. PMID:19882339
Median Robust Extended Local Binary Pattern for Texture Classification.
Liu, Li; Lao, Songyang; Fieguth, Paul W; Guo, Yulan; Wang, Xiaogang; Pietikäinen, Matti
2016-03-01
Local binary patterns (LBP) are considered among the most computationally efficient high-performance texture features. However, the LBP method is very sensitive to image noise and is unable to capture macrostructure information. To best address these disadvantages, in this paper, we introduce a novel descriptor for texture classification, the median robust extended LBP (MRELBP). Different from the traditional LBP and many LBP variants, MRELBP compares regional image medians rather than raw image intensities. A multiscale LBP type descriptor is computed by efficiently comparing image medians over a novel sampling scheme, which can capture both microstructure and macrostructure texture information. A comprehensive evaluation on benchmark data sets reveals MRELBP's high performance-robust to gray scale variations, rotation changes and noise-but at a low computational cost. MRELBP produces the best classification scores of 99.82%, 99.38%, and 99.77% on three popular Outex test suites. More importantly, MRELBP is shown to be highly robust to image noise, including Gaussian noise, Gaussian blur, salt-and-pepper noise, and random pixel corruption.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyna, David; Betty, Rita
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis,thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities. The purpose of this project was to computationally model the impact of neural population dynamics within the neurobiological memory system in order to examine how subareas in the brain enable pattern separation and completion of information in memory across time as associated experiences.
NASA Astrophysics Data System (ADS)
Saghafian, Amirreza; Pitsch, Heinz
2012-11-01
A compressible flamelet/progress variable approach (CFPV) has been devised for high-speed flows. Temperature is computed from the transported total energy and tabulated species mass fractions and the source term of the progress variable is rescaled with pressure and temperature. The combustion is thus modeled by three additional scalar equations and a chemistry table that is computed in a pre-processing step. Three-dimensional direct numerical simulation (DNS) databases of reacting supersonic turbulent mixing layer with detailed chemistry are analyzed to assess the underlying assumptions of CFPV. Large eddy simulations (LES) of the same configuration using the CFPV method have been performed and compared with the DNS results. The LES computations are based on the presumed subgrid PDFs of mixture fraction and progress variable, beta function and delta function respectively, which are assessed using DNS databases. The flamelet equation budget is also computed to verify the validity of CFPV method for high-speed flows.
15 CFR 743.2 - High performance computers: Post shipment verification reporting.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false High performance computers: Post... Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT ADMINISTRATION REGULATIONS SPECIAL REPORTING AND NOTIFICATION § 743.2 High performance computers: Post shipment...
Support Expressed in Congress for U.S. High-Performance Computing
NASA Astrophysics Data System (ADS)
Showstack, Randy
2004-06-01
Advocates for a stronger U.S. position in high-performance computing-which could help with a number of grand challenges in the Earth sciences and other disciplines-hope that legislation recently introduced in the House of Representatives, and, will help to revitalize U.S. efforts. The High-Performance Computing Revitalization Act of 2004 would amend the earlier High-Performance Computing Act of 1991 (Public Law 102-194), which is partially credited with helping to strengthen U.S. capabilities in this area. The bill has the support of the Bush administration.
Rapid tomographic reconstruction based on machine learning for time-resolved combustion diagnostics
NASA Astrophysics Data System (ADS)
Yu, Tao; Cai, Weiwei; Liu, Yingzheng
2018-04-01
Optical tomography has attracted surged research efforts recently due to the progress in both the imaging concepts and the sensor and laser technologies. The high spatial and temporal resolutions achievable by these methods provide unprecedented opportunity for diagnosis of complicated turbulent combustion. However, due to the high data throughput and the inefficiency of the prevailing iterative methods, the tomographic reconstructions which are typically conducted off-line are computationally formidable. In this work, we propose an efficient inversion method based on a machine learning algorithm, which can extract useful information from the previous reconstructions and build efficient neural networks to serve as a surrogate model to rapidly predict the reconstructions. Extreme learning machine is cited here as an example for demonstrative purpose simply due to its ease of implementation, fast learning speed, and good generalization performance. Extensive numerical studies were performed, and the results show that the new method can dramatically reduce the computational time compared with the classical iterative methods. This technique is expected to be an alternative to existing methods when sufficient training data are available. Although this work is discussed under the context of tomographic absorption spectroscopy, we expect it to be useful also to other high speed tomographic modalities such as volumetric laser-induced fluorescence and tomographic laser-induced incandescence which have been demonstrated for combustion diagnostics.
Rapid tomographic reconstruction based on machine learning for time-resolved combustion diagnostics.
Yu, Tao; Cai, Weiwei; Liu, Yingzheng
2018-04-01
Optical tomography has attracted surged research efforts recently due to the progress in both the imaging concepts and the sensor and laser technologies. The high spatial and temporal resolutions achievable by these methods provide unprecedented opportunity for diagnosis of complicated turbulent combustion. However, due to the high data throughput and the inefficiency of the prevailing iterative methods, the tomographic reconstructions which are typically conducted off-line are computationally formidable. In this work, we propose an efficient inversion method based on a machine learning algorithm, which can extract useful information from the previous reconstructions and build efficient neural networks to serve as a surrogate model to rapidly predict the reconstructions. Extreme learning machine is cited here as an example for demonstrative purpose simply due to its ease of implementation, fast learning speed, and good generalization performance. Extensive numerical studies were performed, and the results show that the new method can dramatically reduce the computational time compared with the classical iterative methods. This technique is expected to be an alternative to existing methods when sufficient training data are available. Although this work is discussed under the context of tomographic absorption spectroscopy, we expect it to be useful also to other high speed tomographic modalities such as volumetric laser-induced fluorescence and tomographic laser-induced incandescence which have been demonstrated for combustion diagnostics.
NASA Technical Reports Server (NTRS)
Groves, Curtis Edward
2014-01-01
Spacecraft thermal protection systems are at risk of being damaged due to airflow produced from Environmental Control Systems. There are inherent uncertainties and errors associated with using Computational Fluid Dynamics to predict the airflow field around a spacecraft from the Environmental Control System. This paper describes an approach to quantify the uncertainty in using Computational Fluid Dynamics to predict airflow speeds around an encapsulated spacecraft without the use of test data. Quantifying the uncertainty in analytical predictions is imperative to the success of any simulation-based product. The method could provide an alternative to traditional "validation by test only" mentality. This method could be extended to other disciplines and has potential to provide uncertainty for any numerical simulation, thus lowering the cost of performing these verifications while increasing the confidence in those predictions. Spacecraft requirements can include a maximum airflow speed to protect delicate instruments during ground processing. Computational Fluid Dynamics can be used to verify these requirements; however, the model must be validated by test data. This research includes the following three objectives and methods. Objective one is develop, model, and perform a Computational Fluid Dynamics analysis of three (3) generic, non-proprietary, environmental control systems and spacecraft configurations. Several commercially available and open source solvers have the capability to model the turbulent, highly three-dimensional, incompressible flow regime. The proposed method uses FLUENT, STARCCM+, and OPENFOAM. Objective two is to perform an uncertainty analysis of the Computational Fluid Dynamics model using the methodology found in "Comprehensive Approach to Verification and Validation of Computational Fluid Dynamics Simulations". This method requires three separate grids and solutions, which quantify the error bars around Computational Fluid Dynamics predictions. The method accounts for all uncertainty terms from both numerical and input variables. Objective three is to compile a table of uncertainty parameters that could be used to estimate the error in a Computational Fluid Dynamics model of the Environmental Control System /spacecraft system. Previous studies have looked at the uncertainty in a Computational Fluid Dynamics model for a single output variable at a single point, for example the re-attachment length of a backward facing step. For the flow regime being analyzed (turbulent, three-dimensional, incompressible), the error at a single point can propagate into the solution both via flow physics and numerical methods. Calculating the uncertainty in using Computational Fluid Dynamics to accurately predict airflow speeds around encapsulated spacecraft in is imperative to the success of future missions.
NASA Technical Reports Server (NTRS)
Groves, Curtis Edward
2014-01-01
Spacecraft thermal protection systems are at risk of being damaged due to airflow produced from Environmental Control Systems. There are inherent uncertainties and errors associated with using Computational Fluid Dynamics to predict the airflow field around a spacecraft from the Environmental Control System. This paper describes an approach to quantify the uncertainty in using Computational Fluid Dynamics to predict airflow speeds around an encapsulated spacecraft without the use of test data. Quantifying the uncertainty in analytical predictions is imperative to the success of any simulation-based product. The method could provide an alternative to traditional validation by test only mentality. This method could be extended to other disciplines and has potential to provide uncertainty for any numerical simulation, thus lowering the cost of performing these verifications while increasing the confidence in those predictions.Spacecraft requirements can include a maximum airflow speed to protect delicate instruments during ground processing. Computational Fluid Dynamics can be used to verify these requirements; however, the model must be validated by test data. This research includes the following three objectives and methods. Objective one is develop, model, and perform a Computational Fluid Dynamics analysis of three (3) generic, non-proprietary, environmental control systems and spacecraft configurations. Several commercially available and open source solvers have the capability to model the turbulent, highly three-dimensional, incompressible flow regime. The proposed method uses FLUENT, STARCCM+, and OPENFOAM. Objective two is to perform an uncertainty analysis of the Computational Fluid Dynamics model using the methodology found in Comprehensive Approach to Verification and Validation of Computational Fluid Dynamics Simulations. This method requires three separate grids and solutions, which quantify the error bars around Computational Fluid Dynamics predictions. The method accounts for all uncertainty terms from both numerical and input variables. Objective three is to compile a table of uncertainty parameters that could be used to estimate the error in a Computational Fluid Dynamics model of the Environmental Control System spacecraft system.Previous studies have looked at the uncertainty in a Computational Fluid Dynamics model for a single output variable at a single point, for example the re-attachment length of a backward facing step. For the flow regime being analyzed (turbulent, three-dimensional, incompressible), the error at a single point can propagate into the solution both via flow physics and numerical methods. Calculating the uncertainty in using Computational Fluid Dynamics to accurately predict airflow speeds around encapsulated spacecraft in is imperative to the success of future missions.
NASA Technical Reports Server (NTRS)
Groves, Curtis E.
2013-01-01
Spacecraft thermal protection systems are at risk of being damaged due to airflow produced from Environmental Control Systems. There are inherent uncertainties and errors associated with using Computational Fluid Dynamics to predict the airflow field around a spacecraft from the Environmental Control System. This proposal describes an approach to validate the uncertainty in using Computational Fluid Dynamics to predict airflow speeds around an encapsulated spacecraft. The research described here is absolutely cutting edge. Quantifying the uncertainty in analytical predictions is imperative to the success of any simulation-based product. The method could provide an alternative to traditional"validation by test only'' mentality. This method could be extended to other disciplines and has potential to provide uncertainty for any numerical simulation, thus lowering the cost of performing these verifications while increasing the confidence in those predictions. Spacecraft requirements can include a maximum airflow speed to protect delicate instruments during ground processing. Computationaf Fluid Dynamics can be used to veritY these requirements; however, the model must be validated by test data. The proposed research project includes the following three objectives and methods. Objective one is develop, model, and perform a Computational Fluid Dynamics analysis of three (3) generic, non-proprietary, environmental control systems and spacecraft configurations. Several commercially available solvers have the capability to model the turbulent, highly three-dimensional, incompressible flow regime. The proposed method uses FLUENT and OPEN FOAM. Objective two is to perform an uncertainty analysis of the Computational Fluid . . . Dynamics model using the methodology found in "Comprehensive Approach to Verification and Validation of Computational Fluid Dynamics Simulations". This method requires three separate grids and solutions, which quantify the error bars around Computational Fluid Dynamics predictions. The method accounts for all uncertainty terms from both numerical and input variables. Objective three is to compile a table of uncertainty parameters that could be used to estimate the error in a Computational Fluid Dynamics model of the Environmental Control System /spacecraft system. Previous studies have looked at the uncertainty in a Computational Fluid Dynamics model for a single output variable at a single point, for example the re-attachment length of a backward facing step. To date, the author is the only person to look at the uncertainty in the entire computational domain. For the flow regime being analyzed (turbulent, threedimensional, incompressible), the error at a single point can propagate into the solution both via flow physics and numerical methods. Calculating the uncertainty in using Computational Fluid Dynamics to accurately predict airflow speeds around encapsulated spacecraft in is imperative to the success of future missions.
Hotspot-Centric De Novo Design of Protein Binders
Fleishman, Sarel J.; Corn, Jacob E.; Strauch, Eva-Maria; Whitehead, Timothy A.; Karanicolas, John; Baker, David
2014-01-01
Protein–protein interactions play critical roles in biology, and computational design of interactions could be useful in a range of applications. We describe in detail a general approach to de novo design of protein interactions based on computed, energetically optimized interaction hotspots, which was recently used to produce high-affinity binders of influenza hemagglutinin. We present several alternative approaches to identify and build the key hotspot interactions within both core secondary structural elements and variable loop regions and evaluate the method's performance in natural-interface recapitulation. We show that the method generates binding surfaces that are more conformationally restricted than previous design methods, reducing opportunities for off-target interactions. PMID:21945116
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.
Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard
2015-02-01
Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
NASA Astrophysics Data System (ADS)
Fukushige, Toshiyuki; Taiji, Makoto; Makino, Junichiro; Ebisuzaki, Toshikazu; Sugimoto, Daiichiro
1996-09-01
We have developed a parallel, pipelined special-purpose computer for N-body simulations, MD-GRAPE (for "GRAvity PipE"). In gravitational N- body simulations, almost all computing time is spent on the calculation of interactions between particles. GRAPE is specialized hardware to calculate these interactions. It is used with a general-purpose front-end computer that performs all calculations other than the force calculation. MD-GRAPE is the first parallel GRAPE that can calculate an arbitrary central force. A force different from a pure 1/r potential is necessary for N-body simulations with periodic boundary conditions using the Ewald or particle-particle/particle-mesh (P^3^M) method. MD-GRAPE accelerates the calculation of particle-particle force for these algorithms. An MD- GRAPE board has four MD chips and its peak performance is 4.2 GFLOPS. On an MD-GRAPE board, a cosmological N-body simulation takes 6O0(N/10^6^)^3/2^ s per step for the Ewald method, where N is the number of particles, and would take 24O(N/10^6^) s per step for the P^3^M method, in a uniform distribution of particles.
Toward performance portability of the Albany finite element analysis code using the Kokkos library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...
2018-02-05
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Real-time data reduction capabilities at the Langley 7 by 10 foot high speed tunnel
NASA Technical Reports Server (NTRS)
Fox, C. H., Jr.
1980-01-01
The 7 by 10 foot high speed tunnel performs a wide range of tests employing a variety of model installation methods. To support the reduction of static data from this facility, a generalized wind tunnel data reduction program had been developed for use on the Langley central computer complex. The capabilities of a version of this generalized program adapted for real time use on a dedicated on-site computer are discussed. The input specifications, instructions for the console operator, and full descriptions of the algorithms are included.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
NASA Astrophysics Data System (ADS)
Schulthess, Thomas C.
2013-03-01
The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
Methods for operating parallel computing systems employing sequenced communications
Benner, Robert E.; Gustafson, John L.; Montry, Gary R.
1999-01-01
A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Linear solver performance in elastoplastic problem solution on GPU cluster
NASA Astrophysics Data System (ADS)
Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.
2017-12-01
Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qian, Xiaoqing; Deng, Z. T.
2009-11-10
This is the final report for the Department of Energy (DOE) project DE-FG02-06ER25746, entitled, "Continuing High Performance Computing Research and Education at AAMU". This three-year project was started in August 15, 2006, and it was ended in August 14, 2009. The objective of this project was to enhance high performance computing research and education capabilities at Alabama A&M University (AAMU), and to train African-American and other minority students and scientists in the computational science field for eventual employment with DOE. AAMU has successfully completed all the proposed research and educational tasks. Through the support of DOE, AAMU was able tomore » provide opportunities to minority students through summer interns and DOE computational science scholarship program. In the past three years, AAMU (1). Supported three graduate research assistants in image processing for hypersonic shockwave control experiment and in computational science related area; (2). Recruited and provided full financial support for six AAMU undergraduate summer research interns to participate Research Alliance in Math and Science (RAMS) program at Oak Ridge National Lab (ORNL); (3). Awarded highly competitive 30 DOE High Performance Computing Scholarships ($1500 each) to qualified top AAMU undergraduate students in science and engineering majors; (4). Improved high performance computing laboratory at AAMU with the addition of three high performance Linux workstations; (5). Conducted image analysis for electromagnetic shockwave control experiment and computation of shockwave interactions to verify the design and operation of AAMU-Supersonic wind tunnel. The high performance computing research and education activities at AAMU created great impact to minority students. As praised by Accreditation Board for Engineering and Technology (ABET) in 2009, ?The work on high performance computing that is funded by the Department of Energy provides scholarships to undergraduate students as computational science scholars. This is a wonderful opportunity to recruit under-represented students.? Three ASEE papers were published in 2007, 2008 and 2009 proceedings of ASEE Annual Conferences, respectively. Presentations of these papers were also made at the ASEE Annual Conferences. It is very critical to continue the research and education activities.« less
NASA Astrophysics Data System (ADS)
López, J.; Hernández, J.; Gómez, P.; Faura, F.
2018-02-01
The VOFTools library includes efficient analytical and geometrical routines for (1) area/volume computation, (2) truncation operations that typically arise in VOF (volume of fluid) methods, (3) area/volume conservation enforcement (VCE) in PLIC (piecewise linear interface calculation) reconstruction and(4) computation of the distance from a given point to the reconstructed interface. The computation of a polyhedron volume uses an efficient formula based on a quadrilateral decomposition and a 2D projection of each polyhedron face. The analytical VCE method is based on coupling an interpolation procedure to bracket the solution with an improved final calculation step based on the above volume computation formula. Although the library was originally created to help develop highly accurate advection and reconstruction schemes in the context of VOF methods, it may have more general applications. To assess the performance of the supplied routines, different tests, which are provided in FORTRAN and C, were implemented for several 2D and 3D geometries.
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
NASA Astrophysics Data System (ADS)
Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji
2013-03-01
This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
2016-07-26
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers
NASA Astrophysics Data System (ADS)
Ratcliff, Laura E.; Degomme, A.; Flores-Livas, José A.; Goedecker, Stefan; Genovese, Luigi
2018-03-01
Performing high accuracy hybrid functional calculations for condensed matter systems containing a large number of atoms is at present computationally very demanding or even out of reach if high quality basis sets are used. We present a highly optimized multiple graphics processing unit implementation of the exact exchange operator which allows one to perform fast hybrid functional density-functional theory (DFT) calculations with systematic basis sets without additional approximations for up to a thousand atoms. With this method hybrid DFT calculations of high quality become accessible on state-of-the-art supercomputers within a time-to-solution that is of the same order of magnitude as traditional semilocal-GGA functionals. The method is implemented in a portable open-source library.
Machine learning strategies for systems with invariance properties
NASA Astrophysics Data System (ADS)
Ling, Julia; Jones, Reese; Templeton, Jeremy
2016-08-01
In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.
NASA Astrophysics Data System (ADS)
Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo
2016-07-01
Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Measuring coherence of computer-assisted likelihood ratio methods.
Haraksim, Rudolf; Ramos, Daniel; Meuwly, Didier; Berger, Charles E H
2015-04-01
Measuring the performance of forensic evaluation methods that compute likelihood ratios (LRs) is relevant for both the development and the validation of such methods. A framework of performance characteristics categorized as primary and secondary is introduced in this study to help achieve such development and validation. Ground-truth labelled fingerprint data is used to assess the performance of an example likelihood ratio method in terms of those performance characteristics. Discrimination, calibration, and especially the coherence of this LR method are assessed as a function of the quantity and quality of the trace fingerprint specimen. Assessment of the coherence revealed a weakness of the comparison algorithm in the computer-assisted likelihood ratio method used. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
SU-E-J-128: Two-Stage Atlas Selection in Multi-Atlas-Based Image Segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, T; Ruan, D
2015-06-15
Purpose: In the new era of big data, multi-atlas-based image segmentation is challenged by heterogeneous atlas quality and high computation burden from extensive atlas collection, demanding efficient identification of the most relevant atlases. This study aims to develop a two-stage atlas selection scheme to achieve computational economy with performance guarantee. Methods: We develop a low-cost fusion set selection scheme by introducing a preliminary selection to trim full atlas collection into an augmented subset, alleviating the need for extensive full-fledged registrations. More specifically, fusion set selection is performed in two successive steps: preliminary selection and refinement. An augmented subset is firstmore » roughly selected from the whole atlas collection with a simple registration scheme and the corresponding preliminary relevance metric; the augmented subset is further refined into the desired fusion set size, using full-fledged registration and the associated relevance metric. The main novelty of this work is the introduction of an inference model to relate the preliminary and refined relevance metrics, based on which the augmented subset size is rigorously derived to ensure the desired atlases survive the preliminary selection with high probability. Results: The performance and complexity of the proposed two-stage atlas selection method were assessed using a collection of 30 prostate MR images. It achieved comparable segmentation accuracy as the conventional one-stage method with full-fledged registration, but significantly reduced computation time to 1/3 (from 30.82 to 11.04 min per segmentation). Compared with alternative one-stage cost-saving approach, the proposed scheme yielded superior performance with mean and medium DSC of (0.83, 0.85) compared to (0.74, 0.78). Conclusion: This work has developed a model-guided two-stage atlas selection scheme to achieve significant cost reduction while guaranteeing high segmentation accuracy. The benefit in both complexity and performance is expected to be most pronounced with large-scale heterogeneous data.« less
Simulating Effects of High Angle of Attack on Turbofan Engine Performance
NASA Technical Reports Server (NTRS)
Liu, Yuan; Claus, Russell W.; Litt, Jonathan S.; Guo, Ten-Huei
2013-01-01
A method of investigating the effects of high angle of attack (AOA) flight on turbofan engine performance is presented. The methodology involves combining a suite of diverse simulation tools. Three-dimensional, steady-state computational fluid dynamics (CFD) software is used to model the change in performance of a commercial aircraft-type inlet and fan geometry due to various levels of AOA. Parallel compressor theory is then applied to assimilate the CFD data with a zero-dimensional, nonlinear, dynamic turbofan engine model. The combined model shows that high AOA operation degrades fan performance and, thus, negatively impacts compressor stability margins and engine thrust. In addition, the engine response to high AOA conditions is shown to be highly dependent upon the type of control system employed.
Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying
2016-12-23
Protein-protein interactions (PPIs) are essential to most biological processes. Since bioscience has entered into the era of genome and proteome, there is a growing demand for the knowledge about PPI network. High-throughput biological technologies can be used to identify new PPIs, but they are expensive, time-consuming, and tedious. Therefore, computational methods for predicting PPIs have an important role. For the past years, an increasing number of computational methods such as protein structure-based approaches have been proposed for predicting PPIs. The major limitation in principle of these methods lies in the prior information of the protein to infer PPIs. Therefore, it is of much significance to develop computational methods which only use the information of protein amino acids sequence. Here, we report a highly efficient approach for predicting PPIs. The main improvements come from the use of a novel protein sequence representation by combining continuous wavelet descriptor and Chou's pseudo amino acid composition (PseAAC), and from adopting weighted sparse representation based classifier (WSRC). This method, cross-validated on the PPIs datasets of Saccharomyces cerevisiae, Human and H. pylori, achieves an excellent results with accuracies as high as 92.50%, 95.54% and 84.28% respectively, significantly better than previously proposed methods. Extensive experiments are performed to compare the proposed method with state-of-the-art Support Vector Machine (SVM) classifier. The outstanding results yield by our model that the proposed feature extraction method combing two kinds of descriptors have strong expression ability and are expected to provide comprehensive and effective information for machine learning-based classification models. In addition, the prediction performance in the comparison experiments shows the well cooperation between the combined feature and WSRC. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.
NASA Astrophysics Data System (ADS)
Dimitrakopoulos, Panagiotis
2018-03-01
The calculation of polytropic efficiencies is a very important task, especially during the development of new compression units, like compressor impellers, stages and stage groups. Such calculations are also crucial for the determination of the performance of a whole compressor. As processors and computational capacities have substantially been improved in the last years, the need for a new, rigorous, robust, accurate and at the same time standardized method merged, regarding the computation of the polytropic efficiencies, especially based on thermodynamics of real gases. The proposed method is based on the rigorous definition of the polytropic efficiency. The input consists of pressure and temperature values at the end points of the compression path (suction and discharge), for a given working fluid. The average relative error for the studied cases was 0.536 %. Thus, this high-accuracy method is proposed for efficiency calculations related with turbocompressors and their compression units, especially when they are operating at high power levels, for example in jet engines and high-power plants.
NASA Astrophysics Data System (ADS)
Hyun, Jae-Sang; Li, Beiwen; Zhang, Song
2017-07-01
This paper presents our research findings on high-speed high-accuracy three-dimensional shape measurement using digital light processing (DLP) technologies. In particular, we compare two different sinusoidal fringe generation techniques using the DLP projection devices: direct projection of computer-generated 8-bit sinusoidal patterns (a.k.a., the sinusoidal method), and the creation of sinusoidal patterns by defocusing binary patterns (a.k.a., the binary defocusing method). This paper mainly examines their performance on high-accuracy measurement applications under precisely controlled settings. Two different projection systems were tested in this study: a commercially available inexpensive projector and the DLP development kit. Experimental results demonstrated that the binary defocusing method always outperforms the sinusoidal method if a sufficient number of phase-shifted fringe patterns can be used.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update
Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Čech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy
2016-01-01
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale. PMID:27137889
Fusion of real-time simulation, sensing, and geo-informatics in assessing tsunami impact
NASA Astrophysics Data System (ADS)
Koshimura, S.; Inoue, T.; Hino, R.; Ohta, Y.; Kobayashi, H.; Musa, A.; Murashima, Y.; Gokon, H.
2015-12-01
Bringing together state-of-the-art high-performance computing, remote sensing and spatial information sciences, we establish a method of real-time tsunami inundation forecasting, damage estimation and mapping to enhance disaster response. Right after a major (near field) earthquake is triggered, we perform a real-time tsunami inundation forecasting with use of high-performance computing platform (Koshimura et al., 2014). Using Tohoku University's vector supercomputer, we accomplished "10-10-10 challenge", to complete tsunami source determination in 10 minutes, tsunami inundation modeling in 10 minutes with 10 m grid resolution. Given the maximum flow depth distribution, we perform quantitative estimation of exposed population using census data and mobile phone data, and the numbers of potential death and damaged structures by applying tsunami fragility curve. After the potential tsunami-affected areas are estimated, the analysis gets focused and moves on to the "detection" phase using remote sensing. Recent advances of remote sensing technologies expand capabilities of detecting spatial extent of tsunami affected area and structural damage. Especially, a semi-automated method to estimate building damage in tsunami affected areas is developed using pre- and post-event high-resolution SAR (Synthetic Aperture Radar) data. The method is verified through the case studies in the 2011 Tohoku and other potential tsunami scenarios, and the prototype system development is now underway in Kochi prefecture, one of at-risk coastal city against Nankai trough earthquake. In the trial operation, we verify the capability of the method as a new tsunami early warning and response system for stakeholders and responders.
Precise and fast spatial-frequency analysis using the iterative local Fourier transform.
Lee, Sukmock; Choi, Heejoo; Kim, Dae Wook
2016-09-19
The use of the discrete Fourier transform has decreased since the introduction of the fast Fourier transform (fFT), which is a numerically efficient computing process. This paper presents the iterative local Fourier transform (ilFT), a set of new processing algorithms that iteratively apply the discrete Fourier transform within a local and optimal frequency domain. The new technique achieves 210 times higher frequency resolution than the fFT within a comparable computation time. The method's superb computing efficiency, high resolution, spectrum zoom-in capability, and overall performance are evaluated and compared to other advanced high-resolution Fourier transform techniques, such as the fFT combined with several fitting methods. The effectiveness of the ilFT is demonstrated through the data analysis of a set of Talbot self-images (1280 × 1024 pixels) obtained with an experimental setup using grating in a diverging beam produced by a coherent point source.
Achieving a high mode count in the exact electromagnetic simulation of diffractive optical elements.
Junker, André; Brenner, Karl-Heinz
2018-03-01
The application of rigorous optical simulation algorithms, both in the modal as well as in the time domain, is known to be limited to the nano-optical scale due to severe computing time and memory constraints. This is true even for today's high-performance computers. To address this problem, we develop the fast rigorous iterative method (FRIM), an algorithm based on an iterative approach, which, under certain conditions, allows solving also large-size problems approximation free. We achieve this in the case of a modal representation by avoiding the computationally complex eigenmode decomposition. Thereby, the numerical cost is reduced from O(N 3 ) to O(N log N), enabling a simulation of structures like certain diffractive optical elements with a significantly higher mode count than presently possible. Apart from speed, another major advantage of the iterative FRIM over standard modal methods is the possibility to trade runtime against accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brun, E., E-mail: emmanuel.brun@esrf.fr; Grandl, S.; Sztrókay-Gaul, A.
Purpose: Phase contrast computed tomography has emerged as an imaging method, which is able to outperform present day clinical mammography in breast tumor visualization while maintaining an equivalent average dose. To this day, no segmentation technique takes into account the specificity of the phase contrast signal. In this study, the authors propose a new mathematical framework for human-guided breast tumor segmentation. This method has been applied to high-resolution images of excised human organs, each of several gigabytes. Methods: The authors present a segmentation procedure based on the viscous watershed transform and demonstrate the efficacy of this method on analyzer basedmore » phase contrast images. The segmentation of tumors inside two full human breasts is then shown as an example of this procedure’s possible applications. Results: A correct and precise identification of the tumor boundaries was obtained and confirmed by manual contouring performed independently by four experienced radiologists. Conclusions: The authors demonstrate that applying the watershed viscous transform allows them to perform the segmentation of tumors in high-resolution x-ray analyzer based phase contrast breast computed tomography images. Combining the additional information provided by the segmentation procedure with the already high definition of morphological details and tissue boundaries offered by phase contrast imaging techniques, will represent a valuable multistep procedure to be used in future medical diagnostic applications.« less
Han, Guanghui; Liu, Xiabi; Zheng, Guangyuan; Wang, Murong; Huang, Shan
2018-06-06
Ground-glass opacity (GGO) is a common CT imaging sign on high-resolution CT, which means the lesion is more likely to be malignant compared to common solid lung nodules. The automatic recognition of GGO CT imaging signs is of great importance for early diagnosis and possible cure of lung cancers. The present GGO recognition methods employ traditional low-level features and system performance improves slowly. Considering the high-performance of CNN model in computer vision field, we proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling is performed on multi-views and multi-receptive fields, which reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has the ability to obtain the optimal fine-tuning model. Multi-CNN models fusion strategy obtains better performance than any single trained model. We evaluated our method on the GGO nodule samples in publicly available LIDC-IDRI dataset of chest CT scans. The experimental results show that our method yields excellent results with 96.64% sensitivity, 71.43% specificity, and 0.83 F1 score. Our method is a promising approach to apply deep learning method to computer-aided analysis of specific CT imaging signs with insufficient labeled images. Graphical abstract We proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has ability to obtain the optimal fine-tuning model. Our method is a promising approach to apply deep learning method to computer-aided analysis of specific CT imaging signs with insufficient labeled images.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Senate Committee on Energy and Natural Resources.
The purpose of the bill (S. 343), as reported by the Senate Committee on Energy and Natural Resources, is to establish a federal commitment to the advancement of high-performance computing, improve interagency planning and coordination of federal high-performance computing and networking activities, authorize a national high-speed computer…
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
ERIC Educational Resources Information Center
Office of Science and Technology Policy, Washington, DC.
This report presents the United States research and development program for 1993 for high performance computing and computer communications (HPCC) networks. The first of four chapters presents the program goals and an overview of the federal government's emphasis on high performance computing as an important factor in the nation's scientific and…
High-performance parallel analysis of coupled problems for aircraft propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Lanteri, S.; Gumaste, U.; Ronaghi, M.
1994-01-01
Applications are described of high-performance parallel, computation for the analysis of complete jet engines, considering its multi-discipline coupled problem. The coupled problem involves interaction of structures with gas dynamics, heat conduction and heat transfer in aircraft engines. The methodology issues addressed include: consistent discrete formulation of coupled problems with emphasis on coupling phenomena; effect of partitioning strategies, augmentation and temporal solution procedures; sensitivity of response to problem parameters; and methods for interfacing multiscale discretizations in different single fields. The computer implementation issues addressed include: parallel treatment of coupled systems; domain decomposition and mesh partitioning strategies; data representation in object-oriented form and mapping to hardware driven representation, and tradeoff studies between partitioning schemes and fully coupled treatment.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Huang, Hsuan-Ming; Hsiao, Ing-Tsung
2016-01-01
In recent years, there has been increased interest in low-dose X-ray cone beam computed tomography (CBCT) in many fields, including dentistry, guided radiotherapy and small animal imaging. Despite reducing the radiation dose, low-dose CBCT has not gained widespread acceptance in routine clinical practice. In addition to performing more evaluation studies, developing a fast and high-quality reconstruction algorithm is required. In this work, we propose an iterative reconstruction method that accelerates ordered-subsets (OS) reconstruction using a power factor. Furthermore, we combine it with the total-variation (TV) minimization method. Both simulation and phantom studies were conducted to evaluate the performance of the proposed method. Results show that the proposed method can accelerate conventional OS methods, greatly increase the convergence speed in early iterations. Moreover, applying the TV minimization to the power acceleration scheme can further improve the image quality while preserving the fast convergence rate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Brian M.; Ebeida, Mohamed Salah; Eldred, Michael S.
The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components requiredmore » for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.« less
Huang, Hsuan-Ming; Hsiao, Ing-Tsung
2016-01-01
In recent years, there has been increased interest in low-dose X-ray cone beam computed tomography (CBCT) in many fields, including dentistry, guided radiotherapy and small animal imaging. Despite reducing the radiation dose, low-dose CBCT has not gained widespread acceptance in routine clinical practice. In addition to performing more evaluation studies, developing a fast and high-quality reconstruction algorithm is required. In this work, we propose an iterative reconstruction method that accelerates ordered-subsets (OS) reconstruction using a power factor. Furthermore, we combine it with the total-variation (TV) minimization method. Both simulation and phantom studies were conducted to evaluate the performance of the proposed method. Results show that the proposed method can accelerate conventional OS methods, greatly increase the convergence speed in early iterations. Moreover, applying the TV minimization to the power acceleration scheme can further improve the image quality while preserving the fast convergence rate. PMID:27073853
NASA Technical Reports Server (NTRS)
Aggarwal, Arun K.
1993-01-01
Spherical roller bearings have typically been used in applications with speeds limited to about 5000 rpm and loads limited for operation at less than about 0.25 million DN. However, spherical roller bearings are now being designed for high load and high speed applications including aerospace applications. A computer program, SASHBEAN, was developed to provide an analytical tool to design, analyze, and predict the performance of high speed, single row, angular contact (including zero contact angle), spherical roller bearings. The material presented is the mathematical formulation and analytical methods used to develop computer program SASHBEAN. For a given set of operating conditions, the program calculates the bearings ring deflections (axial and radial), roller deflections, contact areas stresses, depth and magnitude of maximum shear stresses, axial thrust, rolling element and cage rotational speeds, lubrication parameters, fatigue lives, and rates of heat generation. Centrifugal forces and gyroscopic moments are fully considered. The program is also capable of performing steady-state and time-transient thermal analyses of the bearing system.
Generic Hypersonic Inlet Module Analysis
NASA Technical Reports Server (NTRS)
Cockrell, Chares E., Jr.; Huebner, Lawrence D.
2004-01-01
A computational study associated with an internal inlet drag analysis was performed for a generic hypersonic inlet module. The purpose of this study was to determine the feasibility of computing the internal drag force for a generic scramjet engine module using computational methods. The computational study consisted of obtaining two-dimensional (2D) and three-dimensional (3D) computational fluid dynamics (CFD) solutions using the Euler and parabolized Navier-Stokes (PNS) equations. The solution accuracy was assessed by comparisons with experimental pitot pressure data. The CFD analysis indicates that the 3D PNS solutions show the best agreement with experimental pitot pressure data. The internal inlet drag analysis consisted of obtaining drag force predictions based on experimental data and 3D CFD solutions. A comparative assessment of each of the drag prediction methods is made and the sensitivity of CFD drag values to computational procedures is documented. The analysis indicates that the CFD drag predictions are highly sensitive to the computational procedure used.
A Carrier Estimation Method Based on MLE and KF for Weak GNSS Signals.
Zhang, Hongyang; Xu, Luping; Yan, Bo; Zhang, Hua; Luo, Liyan
2017-06-22
Maximum likelihood estimation (MLE) has been researched for some acquisition and tracking applications of global navigation satellite system (GNSS) receivers and shows high performance. However, all current methods are derived and operated based on the sampling data, which results in a large computation burden. This paper proposes a low-complexity MLE carrier tracking loop for weak GNSS signals which processes the coherent integration results instead of the sampling data. First, the cost function of the MLE of signal parameters such as signal amplitude, carrier phase, and Doppler frequency are used to derive a MLE discriminator function. The optimal value of the cost function is searched by an efficient Levenberg-Marquardt (LM) method iteratively. Its performance including Cramér-Rao bound (CRB), dynamic characteristics and computation burden are analyzed by numerical techniques. Second, an adaptive Kalman filter is designed for the MLE discriminator to obtain smooth estimates of carrier phase and frequency. The performance of the proposed loop, in terms of sensitivity, accuracy and bit error rate, is compared with conventional methods by Monte Carlo (MC) simulations both in pedestrian-level and vehicle-level dynamic circumstances. Finally, an optimal loop which combines the proposed method and conventional method is designed to achieve the optimal performance both in weak and strong signal circumstances.
Two-stage atlas subset selection in multi-atlas based image segmentation.
Zhao, Tingting; Ruan, Dan
2015-06-01
Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stage atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.
Application of a fast skyline computation algorithm for serendipitous searching problems
NASA Astrophysics Data System (ADS)
Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary
2018-02-01
Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
Distributed metadata in a high performance computing environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin; Zhang, Zhenhua
A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination thatmore » a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.« less
gpuSPHASE-A shared memory caching implementation for 2D SPH using CUDA
NASA Astrophysics Data System (ADS)
Winkler, Daniel; Meister, Michael; Rezavand, Massoud; Rauch, Wolfgang
2017-04-01
Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD such simulations in 3D need a computation time of years so that a reduction to a 2D domain is inevitable. In this paper gpuSPHASE, a new open-source 2D SPH solver implementation for graphics devices, is developed. It is optimized for simulations that must be executed with thousands of frames per second to be computed in reasonable time. A novel caching algorithm for Compute Unified Device Architecture (CUDA) shared memory is proposed and implemented. The software is validated and the performance is evaluated for the well established dambreak test case.
Simulator certification methods and the vertical motion simulator
NASA Technical Reports Server (NTRS)
Showalter, T. W.
1981-01-01
The vertical motion simulator (VMS) is designed to simulate a variety of experimental helicopter and STOL/VTOL aircraft as well as other kinds of aircraft with special pitch and Z axis characteristics. The VMS includes a large motion base with extensive vertical and lateral travel capabilities, a computer generated image visual system, and a high speed CDC 7600 computer system, which performs aero model calculations. Guidelines on how to measure and evaluate VMS performance were developed. A survey of simulation users was conducted to ascertain they evaluated and certified simulators for use. The results are presented.
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei
2014-10-01
Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
CFD comparison with centrifugal compressor measurements on a wide operating range
NASA Astrophysics Data System (ADS)
Le Sausse, P.; Fabrie, P.; Arnou, D.; Clunet, F.
2013-04-01
Centrifugal compressors are widely used in industrial applications thanks to their high efficiency. They are able to provide a wide operating range before reaching the flow barrier or surge limits. Performances and range are described by compressor maps obtained experimentally. After a description of performance test rig, this article compares measured centrifugal compressor performances with computational fluid dynamics results. These computations are performed at steady conditions with R134a refrigerant as fluid. Navier-Stokes equations, coupled with k-ɛ turbulence model, are solved by the commercial software ANSYS-CFX by means of volume finite method. Input conditions are varied in order to calculate several speed lines. Theoretical isentropic efficiency and theoretical surge line are finally compared to experimental data.
A comparison of fitness-case sampling methods for genetic programming
NASA Astrophysics Data System (ADS)
Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel
2017-11-01
Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
Wagner, Roland; Helin, Tapio; Obereder, Andreas; Ramlau, Ronny
2016-02-20
The imaging quality of modern ground-based telescopes such as the planned European Extremely Large Telescope is affected by atmospheric turbulence. In consequence, they heavily depend on stable and high-performance adaptive optics (AO) systems. Using measurements of incoming light from guide stars, an AO system compensates for the effects of turbulence by adjusting so-called deformable mirror(s) (DMs) in real time. In this paper, we introduce a novel reconstruction method for ground layer adaptive optics. In the literature, a common approach to this problem is to use Bayesian inference in order to model the specific noise structure appearing due to spot elongation. This approach leads to large coupled systems with high computational effort. Recently, fast solvers of linear order, i.e., with computational complexity O(n), where n is the number of DM actuators, have emerged. However, the quality of such methods typically degrades in low flux conditions. Our key contribution is to achieve the high quality of the standard Bayesian approach while at the same time maintaining the linear order speed of the recent solvers. Our method is based on performing a separate preprocessing step before applying the cumulative reconstructor (CuReD). The efficiency and performance of the new reconstructor are demonstrated using the OCTOPUS, the official end-to-end simulation environment of the ESO for extremely large telescopes. For more specific simulations we also use the MOST toolbox.
Cognitive and Neural Bases of Skilled Performance.
1987-10-04
advantage is that this method is not computationally demanding, and model -specific analyses such as high -precision source localization with realistic...and a two- < " high -threshold model satisfy theoretical and pragmatic independence. Discrimination and bias measures from these two models comparing...recognition memory of patients with dementing diseases, amnesics, and normal controls. We found the two- high -threshold model to be more sensitive Lloyd
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions
Liu, Weidong; Luo, Xi
2014-01-01
This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset. PMID:25750463
Computation and visualization of uncertainty in surgical navigation.
Simpson, Amber L; Ma, Burton; Vasarhelyi, Edward M; Borschneck, Dan P; Ellis, Randy E; James Stewart, A
2014-09-01
Surgical displays do not show uncertainty information with respect to the position and orientation of instruments. Data is presented as though it were perfect; surgeons unaware of this uncertainty could make critical navigational mistakes. The propagation of uncertainty to the tip of a surgical instrument is described and a novel uncertainty visualization method is proposed. An extensive study with surgeons has examined the effect of uncertainty visualization on surgical performance with pedicle screw insertion, a procedure highly sensitive to uncertain data. It is shown that surgical performance (time to insert screw, degree of breach of pedicle, and rotation error) is not impeded by the additional cognitive burden imposed by uncertainty visualization. Uncertainty can be computed in real time and visualized without adversely affecting surgical performance, and the best method of uncertainty visualization may depend upon the type of navigation display. Copyright © 2013 John Wiley & Sons, Ltd.
Diagnosis and management of solitary pulmonary nodules.
Jeong, Yeon Joo; Lee, Kyung Soo; Kwon, O Jung
2008-12-01
The advent of computed tomography (CT) screening with or without the help of computer-aided detection systems has increased the detection rate of solitary pulmonary nodules (SPNs), including that of early peripheral lung cancer. Helical dynamic (HD)CT, providing the information on morphologic and hemodynamic characteristics with high specificity and reasonably high accuracy, can be used for the initial assessment of SPNs. (18)F-fluorodeoxyglucose PET/CT is more sensitive at detecting malignancy than HDCT. Therefore, PET/CT may be selectively performed to characterize SPNs when HDCT gives an inconclusive diagnosis. Serial volume measurements are currently the most reliable methods for the tissue characterization of subcentimeter nodules. When malignant nodule is highly suspected for subcentimeter nodules, video-assisted thoracoscopic surgery nodule removal after nodule localization using the pulmonary nodule-marker system may be performed for diagnosis and treatment.
Direct numerical simulation of reactor two-phase flows enabled by high-performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Jun; Cambareri, Joseph J.; Brown, Cameron S.
Nuclear reactor two-phase flows remain a great engineering challenge, where the high-resolution two-phase flow database which can inform practical model development is still sparse due to the extreme reactor operation conditions and measurement difficulties. Owing to the rapid growth of computing power, the direct numerical simulation (DNS) is enjoying a renewed interest in investigating the related flow problems. A combination between DNS and an interface tracking method can provide a unique opportunity to study two-phase flows based on first principles calculations. More importantly, state-of-the-art high-performance computing (HPC) facilities are helping unlock this great potential. This paper reviews the recent researchmore » progress of two-phase flow DNS related to reactor applications. The progress in large-scale bubbly flow DNS has been focused not only on the sheer size of those simulations in terms of resolved Reynolds number, but also on the associated advanced modeling and analysis techniques. Specifically, the current areas of active research include modeling of sub-cooled boiling, bubble coalescence, as well as the advanced post-processing toolkit for bubbly flow simulations in reactor geometries. A novel bubble tracking method has been developed to track the evolution of bubbles in two-phase bubbly flow. Also, spectral analysis of DNS database in different geometries has been performed to investigate the modulation of the energy spectrum slope due to bubble-induced turbulence. In addition, the single-and two-phase analysis results are presented for turbulent flows within the pressurized water reactor (PWR) core geometries. The related simulations are possible to carry out only with the world leading HPC platforms. These simulations are allowing more complex turbulence model development and validation for use in 3D multiphase computational fluid dynamics (M-CFD) codes.« less
Design and Implementation of High-Performance GIS Dynamic Objects Rendering Engine
NASA Astrophysics Data System (ADS)
Zhong, Y.; Wang, S.; Li, R.; Yun, W.; Song, G.
2017-12-01
Spatio-temporal dynamic visualization is more vivid than static visualization. It important to use dynamic visualization techniques to reveal the variation process and trend vividly and comprehensively for the geographical phenomenon. To deal with challenges caused by dynamic visualization of both 2D and 3D spatial dynamic targets, especially for different spatial data types require high-performance GIS dynamic objects rendering engine. The main approach for improving the rendering engine with vast dynamic targets relies on key technologies of high-performance GIS, including memory computing, parallel computing, GPU computing and high-performance algorisms. In this study, high-performance GIS dynamic objects rendering engine is designed and implemented for solving the problem based on hybrid accelerative techniques. The high-performance GIS rendering engine contains GPU computing, OpenGL technology, and high-performance algorism with the advantage of 64-bit memory computing. It processes 2D, 3D dynamic target data efficiently and runs smoothly with vast dynamic target data. The prototype system of high-performance GIS dynamic objects rendering engine is developed based SuperMap GIS iObjects. The experiments are designed for large-scale spatial data visualization, the results showed that the high-performance GIS dynamic objects rendering engine have the advantage of high performance. Rendering two-dimensional and three-dimensional dynamic objects achieve 20 times faster on GPU than on CPU.
Overdetermined shooting methods for computing standing water waves with spectral accuracy
NASA Astrophysics Data System (ADS)
Wilkening, Jon; Yu, Jia
2012-01-01
A high-performance shooting algorithm is developed to compute time-periodic solutions of the free-surface Euler equations with spectral accuracy in double and quadruple precision. The method is used to study resonance and its effect on standing water waves. We identify new nucleation mechanisms in which isolated large-amplitude solutions, and closed loops of such solutions, suddenly exist for depths below a critical threshold. We also study degenerate and secondary bifurcations related to Wilton's ripples in the traveling case, and explore the breakdown of self-similarity at the crests of extreme standing waves. In shallow water, we find that standing waves take the form of counter-propagating solitary waves that repeatedly collide quasi-elastically. In deep water with surface tension, we find that standing waves resemble counter-propagating depression waves. We also discuss the existence and non-uniqueness of solutions, and smooth versus erratic dependence of Fourier modes on wave amplitude and fluid depth. In the numerical method, robustness is achieved by posing the problem as an overdetermined nonlinear system and using either adjoint-based minimization techniques or a quadratically convergent trust-region method to minimize the objective function. Efficiency is achieved in the trust-region approach by parallelizing the Jacobian computation, so the setup cost of computing the Dirichlet-to-Neumann operator in the variational equation is not repeated for each column. Updates of the Jacobian are also delayed until the previous Jacobian ceases to be useful. Accuracy is maintained using spectral collocation with optional mesh refinement in space, a high-order Runge-Kutta or spectral deferred correction method in time and quadruple precision for improved navigation of delicate regions of parameter space as well as validation of double-precision results. Implementation issues for transferring much of the computation to a graphic processing units are briefly discussed, and the performance of the algorithm is tested for a number of hardware configurations.
A Linux Workstation for High Performance Graphics
NASA Technical Reports Server (NTRS)
Geist, Robert; Westall, James
2000-01-01
The primary goal of this effort was to provide a low-cost method of obtaining high-performance 3-D graphics using an industry standard library (OpenGL) on PC class computers. Previously, users interested in doing substantial visualization or graphical manipulation were constrained to using specialized, custom hardware most often found in computers from Silicon Graphics (SGI). We provided an alternative to expensive SGI hardware by taking advantage of third-party, 3-D graphics accelerators that have now become available at very affordable prices. To make use of this hardware our goal was to provide a free, redistributable, and fully-compatible OpenGL work-alike library so that existing bodies of code could simply be recompiled. for PC class machines running a free version of Unix. This should allow substantial cost savings while greatly expanding the population of people with access to a serious graphics development and viewing environment. This should offer a means for NASA to provide a spectrum of graphics performance to its scientists, supplying high-end specialized SGI hardware for high-performance visualization while fulfilling the requirements of medium and lower performance applications with generic, off-the-shelf components and still maintaining compatibility between the two.
A Class of High-Resolution Explicit and Implicit Shock-Capturing Methods
NASA Technical Reports Server (NTRS)
Yee, H. C.
1994-01-01
The development of shock-capturing finite difference methods for hyperbolic conservation laws has been a rapidly growing area for the last decade. Many of the fundamental concepts, state-of-the-art developments and applications to fluid dynamics problems can only be found in meeting proceedings, scientific journals and internal reports. This paper attempts to give a unified and generalized formulation of a class of high-resolution, explicit and implicit shock capturing methods, and to illustrate their versatility in various steady and unsteady complex shock waves, perfect gases, equilibrium real gases and nonequilibrium flow computations. These numerical methods are formulated for the purpose of ease and efficient implementation into a practical computer code. The various constructions of high-resolution shock-capturing methods fall nicely into the present framework and a computer code can be implemented with the various methods as separate modules. Included is a systematic overview of the basic design principle of the various related numerical methods. Special emphasis will be on the construction of the basic nonlinear, spatially second and third-order schemes for nonlinear scalar hyperbolic conservation laws and the methods of extending these nonlinear scalar schemes to nonlinear systems via the approximate Riemann solvers and flux-vector splitting approaches. Generalization of these methods to efficiently include real gases and large systems of nonequilibrium flows will be discussed. Some perbolic conservation laws to problems containing stiff source terms and terms and shock waves are also included. The performance of some of these schemes is illustrated by numerical examples for one-, two- and three-dimensional gas-dynamics problems. The use of the Lax-Friedrichs numerical flux to obtain high-resolution shock-capturing schemes is generalized. This method can be extended to nonlinear systems of equations without the use of Riemann solvers or flux-vector splitting approaches and thus provides a large savings for multidimensional, equilibrium real gases and nonequilibrium flow computations.
Engineering and programming manual: Two-dimensional kinetic reference computer program (TDK)
NASA Technical Reports Server (NTRS)
Nickerson, G. R.; Dang, L. D.; Coats, D. E.
1985-01-01
The Two Dimensional Kinetics (TDK) computer program is a primary tool in applying the JANNAF liquid rocket thrust chamber performance prediction methodology. The development of a methodology that includes all aspects of rocket engine performance from analytical calculation to test measurements, that is physically accurate and consistent, and that serves as an industry and government reference is presented. Recent interest in rocket engines that operate at high expansion ratio, such as most Orbit Transfer Vehicle (OTV) engine designs, has required an extension of the analytical methods used by the TDK computer program. Thus, the version of TDK that is described in this manual is in many respects different from the 1973 version of the program. This new material reflects the new capabilities of the TDK computer program, the most important of which are described.
Research in Computational Astrobiology
NASA Technical Reports Server (NTRS)
Chaban, Galina; Jaffe, Richard; Liang, Shoudan; New, Michael H.; Pohorille, Andrew; Wilson, Michael A.
2002-01-01
We present results from several projects in the new field of computational astrobiology, which is devoted to advancing our understanding of the origin, evolution and distribution of life in the Universe using theoretical and computational tools. We have developed a procedure for calculating long-range effects in molecular dynamics using a plane wave expansion of the electrostatic potential. This method is expected to be highly efficient for simulating biological systems on massively parallel supercomputers. We have perform genomics analysis on a family of actin binding proteins. We have performed quantum mechanical calculations on carbon nanotubes and nucleic acids, which simulations will allow us to investigate possible sources of organic material on the early earth. Finally, we have developed a model of protobiological chemistry using neural networks.
Automatic Mrf-Based Registration of High Resolution Satellite Video Data
NASA Astrophysics Data System (ADS)
Platias, C.; Vakalopoulou, M.; Karantzalos, K.
2016-06-01
In this paper we propose a deformable registration framework for high resolution satellite video data able to automatically and accurately co-register satellite video frames and/or register them to a reference map/image. The proposed approach performs non-rigid registration, formulates a Markov Random Fields (MRF) model, while efficient linear programming is employed for reaching the lowest potential of the cost function. The developed approach has been applied and validated on satellite video sequences from Skybox Imaging and compared with a rigid, descriptor-based registration method. Regarding the computational performance, both the MRF-based and the descriptor-based methods were quite efficient, with the first one converging in some minutes and the second in some seconds. Regarding the registration accuracy the proposed MRF-based method significantly outperformed the descriptor-based one in all the performing experiments.
NAVO MSRC Navigator. Spring 2003
2003-01-01
computational model run on the IBM POWER4 (MARCELLUS) in support of the Airborne Laser Challenge Project II. The data were visualized using Alias|Wavefront Maya...Turbulence in a Jet Stream in the Airborne Laser Context High Performance Computing 11 Largest NAVO MSRC System Becomes Even Bigger and Better 11 Using the smp...centimeters (cm). The resolution requirement to resolve the microjets and the flow outside in the combustor is too severe for any single numerical method
NASA Astrophysics Data System (ADS)
Yang, Yiqun; Urban, Matthew W.; McGough, Robert J.
2018-05-01
Shear wave calculations induced by an acoustic radiation force are very time-consuming on desktop computers, and high-performance graphics processing units (GPUs) achieve dramatic reductions in the computation time for these simulations. The acoustic radiation force is calculated using the fast near field method and the angular spectrum approach, and then the shear waves are calculated in parallel with Green’s functions on a GPU. This combination enables rapid evaluation of shear waves for push beams with different spatial samplings and for apertures with different f/#. Relative to shear wave simulations that evaluate the same algorithm on an Intel i7 desktop computer, a high performance nVidia GPU reduces the time required for these calculations by a factor of 45 and 700 when applied to elastic and viscoelastic shear wave simulation models, respectively. These GPU-accelerated simulations also compared to measurements in different viscoelastic phantoms, and the results are similar. For parametric evaluations and for comparisons with measured shear wave data, shear wave simulations with the Green’s function approach are ideally suited for high-performance GPUs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAILEY, DAVID H.; BORWEIN, JONATHAN M.
A recent paper by the present authors, together with mathematical physicists David Broadhurst and M. Larry Glasser, explored Bessel moment integrals, namely definite integrals of the general form {integral}{sub 0}{sup {infinity}} t{sup m}f{sup n}(t) dt, where the function f(t) is one of the classical Bessel functions. In that paper, numerous previously unknown analytic evaluations were obtained, using a combination of analytic methods together with some fairly high-powered numerical computations, often performed on highly parallel computers. In several instances, while we were able to numerically discover what appears to be a solid analytic identity, based on extremely high-precision numerical computations, wemore » were unable to find a rigorous proof. Thus we present here a brief list of some of these unproven but numerically confirmed identities.« less
Analysis of high aspect ratio jet flap wings of arbitrary geometry.
NASA Technical Reports Server (NTRS)
Lissaman, P. B. S.
1973-01-01
Paper presents a design technique for rapidly computing lift, induced drag, and spanwise loading of unswept jet flap wings of arbitrary thickness, chord, twist, blowing, and jet angle, including discontinuities. Linear theory is used, extending Spence's method for elliptically loaded jet flap wings. Curves for uniformly blown rectangular wings are presented for direct performance estimation. Arbitrary planforms require a simple computer program. Method of reducing wing to equivalent stretched, twisted, unblown planform for hand calculation is also given. Results correlate with limited existing data, and show lifting line theory is reasonable down to aspect ratios of 5.
NASA Technical Reports Server (NTRS)
1972-01-01
The QL module of the Performance Analysis and Design Synthesis (PADS) computer program is described. Execution of this module is initiated when and if subroutine PADSI calls subroutine GROPE. Subroutine GROPE controls the high level logical flow of the QL module. The purpose of the module is to determine a trajectory that satisfies the necessary variational conditions for optimal performance. The module achieves this by solving a nonlinear multi-point boundary value problem. The numerical method employed is described. It is an iterative technique that converges quadratically when it does converge. The three basic steps of the module are: (1) initialization, (2) iteration, and (3) culmination. For Volume 1 see N73-13199.
Data-driven train set crash dynamics simulation
NASA Astrophysics Data System (ADS)
Tang, Zhao; Zhu, Yunrui; Nie, Yinyu; Guo, Shihui; Liu, Fengjia; Chang, Jian; Zhang, Jianjun
2017-02-01
Traditional finite element (FE) methods are arguably expensive in computation/simulation of the train crash. High computational cost limits their direct applications in investigating dynamic behaviours of an entire train set for crashworthiness design and structural optimisation. On the contrary, multi-body modelling is widely used because of its low computational cost with the trade-off in accuracy. In this study, a data-driven train crash modelling method is proposed to improve the performance of a multi-body dynamics simulation of train set crash without increasing the computational burden. This is achieved by the parallel random forest algorithm, which is a machine learning approach that extracts useful patterns of force-displacement curves and predicts a force-displacement relation in a given collision condition from a collection of offline FE simulation data on various collision conditions, namely different crash velocities in our analysis. Using the FE simulation results as a benchmark, we compared our method with traditional multi-body modelling methods and the result shows that our data-driven method improves the accuracy over traditional multi-body models in train crash simulation and runs at the same level of efficiency.
High-Order Methods for Incompressible Fluid Flow
NASA Astrophysics Data System (ADS)
Deville, M. O.; Fischer, P. F.; Mund, E. H.
2002-08-01
High-order numerical methods provide an efficient approach to simulating many physical problems. This book considers the range of mathematical, engineering, and computer science topics that form the foundation of high-order numerical methods for the simulation of incompressible fluid flows in complex domains. Introductory chapters present high-order spatial and temporal discretizations for one-dimensional problems. These are extended to multiple space dimensions with a detailed discussion of tensor-product forms, multi-domain methods, and preconditioners for iterative solution techniques. Numerous discretizations of the steady and unsteady Stokes and Navier-Stokes equations are presented, with particular sttention given to enforcement of imcompressibility. Advanced discretizations. implementation issues, and parallel and vector performance are considered in the closing sections. Numerous examples are provided throughout to illustrate the capabilities of high-order methods in actual applications.
BridgeRank: A novel fast centrality measure based on local structure of the network
NASA Astrophysics Data System (ADS)
Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh
2018-04-01
Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.
Characteristic analysis and simulation for polysilicon comb micro-accelerometer
NASA Astrophysics Data System (ADS)
Liu, Fengli; Hao, Yongping
2008-10-01
High force update rate is a key factor for achieving high performance haptic rendering, which imposes a stringent real time requirement upon the execution environment of the haptic system. This requirement confines the haptic system to simplified environment for reducing the computation cost of haptic rendering algorithms. In this paper, we present a novel "hyper-threading" architecture consisting of several threads for haptic rendering. The high force update rate is achieved with relatively large computation time interval for each haptic loop. The proposed method was testified and proved to be effective with experiments on virtual wall prototype haptic system via Delta Haptic Device.
High-Performance Single-Photon Sources via Spatial Multiplexing
2014-01-01
ingredient for tasks such as quantum cryptography , quantum repeater, quantum teleportation, quantum computing, and truly-random number generation. Recently...SECURITY CLASSIFICATION OF: Single photons sources are desired for many potential quantum information applications. One common method to produce...photons sources are desired for many potential quantum information applications. One common method to produce single photons is based on a “heralding
Identification of informative features for predicting proinflammatory potentials of engine exhausts.
Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei
2017-08-18
The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
Cost efficient CFD simulations: Proper selection of domain partitioning strategies
NASA Astrophysics Data System (ADS)
Haddadi, Bahram; Jordan, Christian; Harasek, Michael
2017-10-01
Computational Fluid Dynamics (CFD) is one of the most powerful simulation methods, which is used for temporally and spatially resolved solutions of fluid flow, heat transfer, mass transfer, etc. One of the challenges of Computational Fluid Dynamics is the extreme hardware demand. Nowadays super-computers (e.g. High Performance Computing, HPC) featuring multiple CPU cores are applied for solving-the simulation domain is split into partitions for each core. Some of the different methods for partitioning are investigated in this paper. As a practical example, a new open source based solver was utilized for simulating packed bed adsorption, a common separation method within the field of thermal process engineering. Adsorption can for example be applied for removal of trace gases from a gas stream or pure gases production like Hydrogen. For comparing the performance of the partitioning methods, a 60 million cell mesh for a packed bed of spherical adsorbents was created; one second of the adsorption process was simulated. Different partitioning methods available in OpenFOAM® (Scotch, Simple, and Hierarchical) have been used with different numbers of sub-domains. The effect of the different methods and number of processor cores on the simulation speedup and also energy consumption were investigated for two different hardware infrastructures (Vienna Scientific Clusters VSC 2 and VSC 3). As a general recommendation an optimum number of cells per processor core was calculated. Optimized simulation speed, lower energy consumption and consequently the cost effects are reported here.
Interactive Supercomputing’s Star-P Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edelman, Alan; Husbands, Parry; Leibman, Steve
2006-09-19
The thesis of this extended abstract is simple. High productivity comes from high level infrastructures. To measure this, we introduce a methodology that goes beyond the tradition of timing software in serial and tuned parallel modes. We perform a classroom productivity study involving 29 students who have written a homework exercise in a low level language (MPI message passing) and a high level language (Star-P with MATLAB client). Our conclusions indicate what perhaps should be of little surprise: (1) the high level language is always far easier on the students than the low level language. (2) The early versions ofmore » the high level language perform inadequately compared to the tuned low level language, but later versions substantially catch up. Asymptotically, the analogy must hold that message passing is to high level language parallel programming as assembler is to high level environments such as MATLAB, Mathematica, Maple, or even Python. We follow the Kepner method that correctly realizes that traditional speedup numbers without some discussion of the human cost of reaching these numbers can fail to reflect the true human productivity cost of high performance computing. Traditional data compares low level message passing with serial computation. With the benefit of a high level language system in place, in our case Star-P running with MATLAB client, and with the benefit of a large data pool: 29 students, each running the same code ten times on three evolutions of the same platform, we can methodically demonstrate the productivity gains. To date we are not aware of any high level system as extensive and interoperable as Star-P, nor are we aware of an experiment of this kind performed with this volume of data.« less
Efficiently passing messages in distributed spiking neural network simulation.
Thibeault, Corey M; Minkovich, Kirill; O'Brien, Michael J; Harris, Frederick C; Srinivasa, Narayan
2013-01-01
Efficiently passing spiking messages in a neural model is an important aspect of high-performance simulation. As the scale of networks has increased so has the size of the computing systems required to simulate them. In addition, the information exchange of these resources has become more of an impediment to performance. In this paper we explore spike message passing using different mechanisms provided by the Message Passing Interface (MPI). A specific implementation, MVAPICH, designed for high-performance clusters with Infiniband hardware is employed. The focus is on providing information about these mechanisms for users of commodity high-performance spiking simulators. In addition, a novel hybrid method for spike exchange was implemented and benchmarked.
High-performance computing in image registration
NASA Astrophysics Data System (ADS)
Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro
2012-10-01
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lottes, S.A.; Kulak, R.F.; Bojanowski, C.
2011-08-26
The computational fluid dynamics (CFD) and computational structural mechanics (CSM) focus areas at Argonne's Transportation Research and Analysis Computing Center (TRACC) initiated a project to support and compliment the experimental programs at the Turner-Fairbank Highway Research Center (TFHRC) with high performance computing based analysis capabilities in August 2010. The project was established with a new interagency agreement between the Department of Energy and the Department of Transportation to provide collaborative research, development, and benchmarking of advanced three-dimensional computational mechanics analysis methods to the aerodynamics and hydraulics laboratories at TFHRC for a period of five years, beginning in October 2010. Themore » analysis methods employ well-benchmarked and supported commercial computational mechanics software. Computational mechanics encompasses the areas of Computational Fluid Dynamics (CFD), Computational Wind Engineering (CWE), Computational Structural Mechanics (CSM), and Computational Multiphysics Mechanics (CMM) applied in Fluid-Structure Interaction (FSI) problems. The major areas of focus of the project are wind and water loads on bridges - superstructure, deck, cables, and substructure (including soil), primarily during storms and flood events - and the risks that these loads pose to structural failure. For flood events at bridges, another major focus of the work is assessment of the risk to bridges caused by scour of stream and riverbed material away from the foundations of a bridge. Other areas of current research include modeling of flow through culverts to assess them for fish passage, modeling of the salt spray transport into bridge girders to address suitability of using weathering steel in bridges, vehicle stability under high wind loading, and the use of electromagnetic shock absorbers to improve vehicle stability under high wind conditions. This quarterly report documents technical progress on the project tasks for the period of April through June 2011.« less
An advanced probabilistic structural analysis method for implicit performance functions
NASA Technical Reports Server (NTRS)
Wu, Y.-T.; Millwater, H. R.; Cruse, T. A.
1989-01-01
In probabilistic structural analysis, the performance or response functions usually are implicitly defined and must be solved by numerical analysis methods such as finite element methods. In such cases, the most commonly used probabilistic analysis tool is the mean-based, second-moment method which provides only the first two statistical moments. This paper presents a generalized advanced mean value (AMV) method which is capable of establishing the distributions to provide additional information for reliability design. The method requires slightly more computations than the second-moment method but is highly efficient relative to the other alternative methods. In particular, the examples show that the AMV method can be used to solve problems involving non-monotonic functions that result in truncated distributions.
Acceleration of GPU-based Krylov solvers via data transfer reduction
Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...
2015-04-08
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
Modal ring method for the scattering of sound
NASA Technical Reports Server (NTRS)
Baumeister, Kenneth J.; Kreider, Kevin L.
1993-01-01
The modal element method for acoustic scattering can be simplified when the scattering body is rigid. In this simplified method, called the modal ring method, the scattering body is represented by a ring of triangular finite elements forming the outer surface. The acoustic pressure is calculated at the element nodes. The pressure in the infinite computational region surrounding the body is represented analytically by an eigenfunction expansion. The two solution forms are coupled by the continuity of pressure and velocity on the body surface. The modal ring method effectively reduces the two-dimensional scattering problem to a one-dimensional problem capable of handling very high frequency scattering. In contrast to the boundary element method or the method of moments, which perform a similar reduction in problem dimension, the model line method has the added advantage of having a highly banded solution matrix requiring considerably less computer storage. The method shows excellent agreement with analytic results for scattering from rigid circular cylinders over a wide frequency range (1 is equal to or less than ka is less than or equal to 100) in the near and far fields.
Early experiences in developing and managing the neuroscience gateway.
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T
2015-02-01
The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.
2015-01-01
SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
A Comparison of Lifting-Line and CFD Methods with Flight Test Data from a Research Puma Helicopter
NASA Technical Reports Server (NTRS)
Bousman, William G.; Young, Colin; Toulmay, Francois; Gilbert, Neil E.; Strawn, Roger C.; Miller, Judith V.; Maier, Thomas H.; Costes, Michel; Beaumier, Philippe
1996-01-01
Four lifting-line methods were compared with flight test data from a research Puma helicopter and the accuracy assessed over a wide range of flight speeds. Hybrid Computational Fluid Dynamics (CFD) methods were also examined for two high-speed conditions. A parallel analytical effort was performed with the lifting-line methods to assess the effects of modeling assumptions and this provided insight into the adequacy of these methods for load predictions.
Prediction of brain-computer interface aptitude from individual brain structure
Halder, S.; Varkuti, B.; Bogdan, M.; Kübler, A.; Rosenstiel, W.; Sitaram, R.; Birbaumer, N.
2013-01-01
Objective: Brain-computer interface (BCI) provide a non-muscular communication channel for patients with impairments of the motor system. A significant number of BCI users is unable to obtain voluntary control of a BCI-system in proper time. This makes methods that can be used to determine the aptitude of a user necessary. Methods: We hypothesized that integrity and connectivity of involved white matter connections may serve as a predictor of individual BCI-performance. Therefore, we analyzed structural data from anatomical scans and DTI of motor imagery BCI-users differentiated into high and low BCI-aptitude groups based on their overall performance. Results: Using a machine learning classification method we identified discriminating structural brain trait features and correlated the best features with a continuous measure of individual BCI-performance. Prediction of the aptitude group of each participant was possible with near perfect accuracy (one error). Conclusions: Tissue volumetric analysis yielded only poor classification results. In contrast, the structural integrity and myelination quality of deep white matter structures such as the Corpus Callosum, Cingulum, and Superior Fronto-Occipital Fascicle were positively correlated with individual BCI-performance. Significance: This confirms that structural brain traits contribute to individual performance in BCI use. PMID:23565083
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-01-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
Pagan, Marino
2014-01-01
The responses of high-level neurons tend to be mixtures of many different types of signals. While this diversity is thought to allow for flexible neural processing, it presents a challenge for understanding how neural responses relate to task performance and to neural computation. To address these challenges, we have developed a new method to parse the responses of individual neurons into weighted sums of intuitive signal components. Our method computes the weights by projecting a neuron's responses onto a predefined orthonormal basis. Once determined, these weights can be combined into measures of signal modulation; however, in their raw form these signal modulation measures are biased by noise. Here we introduce and evaluate two methods for correcting this bias, and we report that an analytically derived approach produces performance that is robust and superior to a bootstrap procedure. Using neural data recorded from inferotemporal cortex and perirhinal cortex as monkeys performed a delayed-match-to-sample target search task, we demonstrate how the method can be used to quantify the amounts of task-relevant signals in heterogeneous neural populations. We also demonstrate how these intuitive quantifications of signal modulation can be related to single-neuron measures of task performance (d′). PMID:24920017
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG
NASA Astrophysics Data System (ADS)
Griffiths, M. K.; Fedun, V.; Erdélyi, R.
2015-03-01
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Method for computationally efficient design of dielectric laser accelerator structures
Hughes, Tyler; Veronis, Georgios; Wootton, Kent P.; ...
2017-06-22
Here, dielectric microstructures have generated much interest in recent years as a means of accelerating charged particles when powered by solid state lasers. The acceleration gradient (or particle energy gain per unit length) is an important figure of merit. To design structures with high acceleration gradients, we explore the adjoint variable method, a highly efficient technique used to compute the sensitivity of an objective with respect to a large number of parameters. With this formalism, the sensitivity of the acceleration gradient of a dielectric structure with respect to its entire spatial permittivity distribution is calculated by the use of onlymore » two full-field electromagnetic simulations, the original and ‘adjoint’. The adjoint simulation corresponds physically to the reciprocal situation of a point charge moving through the accelerator gap and radiating. Using this formalism, we perform numerical optimizations aimed at maximizing acceleration gradients, which generate fabricable structures of greatly improved performance in comparison to previously examined geometries.« less
Park, Subok; Clarkson, Eric
2010-01-01
The Bayesian ideal observer is optimal among all observers and sets an absolute upper bound for the performance of any observer in classification tasks [Van Trees, Detection, Estimation, and Modulation Theory, Part I (Academic, 1968).]. Therefore, the ideal observer should be used for objective image quality assessment whenever possible. However, computation of ideal-observer performance is difficult in practice because this observer requires the full description of unknown, statistical properties of high-dimensional, complex data arising in real life problems. Previously, Markov-chain Monte Carlo (MCMC) methods were developed by Kupinski et al. [J. Opt. Soc. Am. A 20, 430(2003) ] and by Park et al. [J. Opt. Soc. Am. A 24, B136 (2007) and IEEE Trans. Med. Imaging 28, 657 (2009) ] to estimate the performance of the ideal observer and the channelized ideal observer (CIO), respectively, in classification tasks involving non-Gaussian random backgrounds. However, both algorithms had the disadvantage of long computation times. We propose a fast MCMC for real-time estimation of the likelihood ratio for the CIO. Our simulation results show that our method has the potential to speed up ideal-observer performance in tasks involving complex data when efficient channels are used for the CIO. PMID:19884916
Data Storage and Transfer | High-Performance Computing | NREL
High-Performance Computing (HPC) systems. Photo of computer server wiring and lights, blurred to show data. WinSCP for Windows File Transfers Use to transfer files from a local computer to a remote computer. Robinhood for File Management Use this tool to manage your data files on Peregrine. Best
Asymmetric Core Computing for U.S. Army High-Performance Computing Applications
2009-04-01
Playstation 4 (should one be announced). 8 4.2 FPGAs Reconfigurable computing refers to performing computations using Field Programmable Gate Arrays...2008 4 . TITLE AND SUBTITLE Asymmetric Core Computing for U.S. Army High-Performance Computing Applications 5a. CONTRACT NUMBER 5b. GRANT NUMBER...Acknowledgments vi 1. Introduction 1 2. Relevant Technologies 2 3. Technical Approach 5 4 . Research and Development Highlights 7 4.1 Cell
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
HPCCP/CAS Workshop Proceedings 1998
NASA Technical Reports Server (NTRS)
Schulbach, Catherine; Mata, Ellen (Editor); Schulbach, Catherine (Editor)
1999-01-01
This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey.
Muratov, Eugene; Lewis, Margaret; Fourches, Denis; Tropsha, Alexander; Cox, Wendy C
2017-04-01
Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools. Methods. All PharmD candidates over three admission cycles were divided into two groups: those who completed the PharmD program with a GPA ≥ 3; and the remaining candidates. Random Forest machine learning technique was used to develop a binary classification model based on 11 pre-admission parameters. Results. Robust and externally predictive models were developed that had particularly high overall accuracy of 77% for candidates with high or low academic performance. These multivariate models were highly accurate in predicting these groups to those obtained using undergraduate GPA and composite PCAT scores only. Conclusion. The models developed in this study can be used to improve the admission process as preliminary filters and thus quickly identify candidates who are likely to be successful in the PharmD curriculum.
Sensing and Active Flow Control for Advanced BWB Propulsion-Airframe Integration Concepts
NASA Technical Reports Server (NTRS)
Fleming, John; Anderson, Jason; Ng, Wing; Harrison, Neal
2005-01-01
In order to realize the substantial performance benefits of serpentine boundary layer ingesting diffusers, this study investigated the use of enabling flow control methods to reduce engine-face flow distortion. Computational methods and novel flow control modeling techniques were utilized that allowed for rapid, accurate analysis of flow control geometries. Results were validated experimentally using the Techsburg Ejector-based wind tunnel facility; this facility is capable of simulating the high-altitude, high subsonic Mach number conditions representative of BWB cruise conditions.
Data Transfer Study HPSS Archiving
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wynne, James; Parete-Koon, Suzanne T; Mitchell, Quinn
2015-01-01
The movement of the large amounts of data produced by codes run in a High Performance Computing (HPC) environment can be a bottleneck for project workflows. To balance filesystem capacity and performance requirements, HPC centers enforce data management policies to purge old files to make room for new computation and analysis results. Users at Oak Ridge Leadership Computing Facility (OLCF) and many other HPC user facilities must archive data to avoid data loss during purges, therefore the time associated with data movement for archiving is something that all users must consider. This study observed the difference in transfer speed frommore » the originating location on the Lustre filesystem to the more permanent High Performance Storage System (HPSS). The tests were done with a number of different transfer methods for files that spanned a variety of sizes and compositions that reflect OLCF user data. This data will be used to help users of Titan and other Cray supercomputers plan their workflow and data transfers so that they are most efficient for their project. We will also discuss best practice for maintaining data at shared user facilities.« less
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
NASA Technical Reports Server (NTRS)
Thomas, Jr., Jess B. (Inventor)
1991-01-01
An improved digital phase lock loop incorporates several distinctive features that attain better performance at high loop gain and better phase accuracy. These features include: phase feedback to a number-controlled oscillator in addition to phase rate; analytical tracking of phase (both integer and fractional cycles); an amplitude-insensitive phase extractor; a more accurate method for extracting measured phase; a method for changing loop gain during a track without loss of lock; and a method for avoiding loss of sampled data during computation delay, while maintaining excellent tracking performance. The advantages of using phase and phase-rate feedback are demonstrated by comparing performance with that of rate-only feedback. Extraction of phase by the method of modeling provides accurate phase measurements even when the number-controlled oscillator phase is discontinuously updated.
Fast, high-fidelity readout of multiple qubits
NASA Astrophysics Data System (ADS)
Bronn, N. T.; Abdo, B.; Inoue, K.; Lekuch, S.; Córcoles, A. D.; Hertzberg, J. B.; Takita, M.; Bishop, L. S.; Gambetta, J. M.; Chow, J. M.
2017-05-01
Quantum computing requires a delicate balance between coupling quantum systems to external instruments for control and readout, while providing enough isolation from sources of decoherence. Circuit quantum electrodynamics has been a successful method for protecting superconducting qubits, while maintaining the ability to perform readout [1, 2]. Here, we discuss improvements to this method that allow for fast, high-fidelity readout. Specifically, the integration of a Purcell filter, which allows us to increase the resonator bandwidth for fast readout, the incorporation of a Josephson parametric converter, which enables us to perform high-fidelity readout by amplifying the readout signal while adding the minimum amount of noise required by quantum mechanics, and custom control electronics, which provide us with the capability of fast decision and control.
Closed loop statistical performance analysis of N-K knock controllers
NASA Astrophysics Data System (ADS)
Peyton Jones, James C.; Shayestehmanesh, Saeed; Frey, Jesse
2017-09-01
The closed loop performance of engine knock controllers cannot be rigorously assessed from single experiments or simulations because knock behaves as a random process and therefore the response belongs to a random distribution also. In this work a new method is proposed for computing the distributions and expected values of the closed loop response, both in steady state and in response to disturbances. The method takes as its input the control law, and the knock propensity characteristic of the engine which is mapped from open loop steady state tests. The method is applicable to the 'n-k' class of knock controllers in which the control action is a function only of the number of cycles n since the last control move, and the number k of knock events that have occurred in this time. A Cumulative Summation (CumSum) based controller falls within this category, and the method is used to investigate the performance of the controller in a deeper and more rigorous way than has previously been possible. The results are validated using onerous Monte Carlo simulations, which confirm both the validity of the method and its high computational efficiency.
CORDIC-based digital signal processing (DSP) element for adaptive signal processing
NASA Astrophysics Data System (ADS)
Bolstad, Gregory D.; Neeld, Kenneth B.
1995-04-01
The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
Use of model calibration to achieve high accuracy in analysis of computer networks
Frogner, Bjorn; Guarro, Sergio; Scharf, Guy
2004-05-11
A system and method are provided for creating a network performance prediction model, and calibrating the prediction model, through application of network load statistical analyses. The method includes characterizing the measured load on the network, which may include background load data obtained over time, and may further include directed load data representative of a transaction-level event. Probabilistic representations of load data are derived to characterize the statistical persistence of the network performance variability and to determine delays throughout the network. The probabilistic representations are applied to the network performance prediction model to adapt the model for accurate prediction of network performance. Certain embodiments of the method and system may be used for analysis of the performance of a distributed application characterized as data packet streams.
Automated problem scheduling and reduction of synchronization delay effects
NASA Technical Reports Server (NTRS)
Saltz, Joel H.
1987-01-01
It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs.
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Anderson, Thomas E.; Ousterhout, John K.; Patterson, David A.
1991-01-01
Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications.
Image analysis of pubic bone for age estimation in a computed tomography sample.
López-Alcaraz, Manuel; González, Pedro Manuel Garamendi; Aguilera, Inmaculada Alemán; López, Miguel Botella
2015-03-01
Radiology has demonstrated great utility for age estimation, but most of the studies are based on metrical and morphological methods in order to perform an identification profile. A simple image analysis-based method is presented, aimed to correlate the bony tissue ultrastructure with several variables obtained from the grey-level histogram (GLH) of computed tomography (CT) sagittal sections of the pubic symphysis surface and the pubic body, and relating them with age. The CT sample consisted of 169 hospital Digital Imaging and Communications in Medicine (DICOM) archives of known sex and age. The calculated multiple regression models showed a maximum R (2) of 0.533 for females and 0.726 for males, with a high intra- and inter-observer agreement. The method suggested is considered not only useful for performing an identification profile during virtopsy, but also for application in further studies in order to attach a quantitative correlation for tissue ultrastructure characteristics, without complex and expensive methods beyond image analysis.
Quantum Accelerators for High-performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.
We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
Staff | Computational Science | NREL
develops and leads laboratory-wide efforts in high-performance computing and energy-efficient data centers Professional IV-High Perf Computing Jim.Albin@nrel.gov 303-275-4069 Ananthan, Shreyas Senior Scientist - High -Performance Algorithms and Modeling Shreyas.Ananthan@nrel.gov 303-275-4807 Bendl, Kurt IT Professional IV-High
Two-stage atlas subset selection in multi-atlas based image segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu
2015-06-15
Purpose: Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. Methods: An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stagemore » atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. Results: The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. Conclusions: The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de
2016-11-15
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Zhao, Bo; Haldar, Justin P.; Christodoulou, Anthony G.; Liang, Zhi-Pei
2012-01-01
Partial separability (PS) and sparsity have been previously used to enable reconstruction of dynamic images from undersampled (k, t)-space data. This paper presents a new method to use PS and sparsity constraints jointly for enhanced performance in this context. The proposed method combines the complementary advantages of PS and sparsity constraints using a unified formulation, achieving significantly better reconstruction performance than using either of these constraints individually. A globally convergent computational algorithm is described to efficiently solve the underlying optimization problem. Reconstruction results from simulated and in vivo cardiac MRI data are also shown to illustrate the performance of the proposed method. PMID:22695345
Driver behavior profiling: An investigation with different smartphone sensors and machine learning
Ferreira, Jair; Carvalho, Eduardo; Ferreira, Bruno V.; de Souza, Cleidson; Suhara, Yoshihiko; Pentland, Alex
2017-01-01
Driver behavior impacts traffic safety, fuel/energy consumption and gas emissions. Driver behavior profiling tries to understand and positively impact driver behavior. Usually driver behavior profiling tasks involve automated collection of driving data and application of computer models to generate a classification that characterizes the driver aggressiveness profile. Different sensors and classification methods have been employed in this task, however, low-cost solutions and high performance are still research targets. This paper presents an investigation with different Android smartphone sensors, and classification algorithms in order to assess which sensor/method assembly enables classification with higher performance. The results show that specific combinations of sensors and intelligent methods allow classification performance improvement. PMID:28394925
A highly parallel multigrid-like method for the solution of the Euler equations
NASA Technical Reports Server (NTRS)
Tuminaro, Ray S.
1989-01-01
We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.
A high-performance seizure detection algorithm based on Discrete Wavelet Transform (DWT) and EEG
Chen, Duo; Wan, Suiren; Xiang, Jing; Bao, Forrest Sheng
2017-01-01
In the past decade, Discrete Wavelet Transform (DWT), a powerful time-frequency tool, has been widely used in computer-aided signal analysis of epileptic electroencephalography (EEG), such as the detection of seizures. One of the important hurdles in the applications of DWT is the settings of DWT, which are chosen empirically or arbitrarily in previous works. The objective of this study aimed to develop a framework for automatically searching the optimal DWT settings to improve accuracy and to reduce computational cost of seizure detection. To address this, we developed a method to decompose EEG data into 7 commonly used wavelet families, to the maximum theoretical level of each mother wavelet. Wavelets and decomposition levels providing the highest accuracy in each wavelet family were then searched in an exhaustive selection of frequency bands, which showed optimal accuracy and low computational cost. The selection of frequency bands and features removed approximately 40% of redundancies. The developed algorithm achieved promising performance on two well-tested EEG datasets (accuracy >90% for both datasets). The experimental results of the developed method have demonstrated that the settings of DWT affect its performance on seizure detection substantially. Compared with existing seizure detection methods based on wavelet, the new approach is more accurate and transferable among datasets. PMID:28278203
Computational Fluency Performance Profile of High School Students with Mathematics Disabilities
ERIC Educational Resources Information Center
Calhoon, Mary Beth; Emerson, Robert Wall; Flores, Margaret; Houchins, David E.
2007-01-01
The purpose of this descriptive study was to develop a computational fluency performance profile of 224 high school (Grades 9-12) students with mathematics disabilities (MD). Computational fluency performance was examined by grade-level expectancy (Grades 2-6) and skill area (whole numbers: addition, subtraction, multiplication, division;…
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.
2011-01-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W
2012-06-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
FY07 NRL DoD High Performance Computing Modernization Program Annual Reports
2008-09-05
performed. Implicit and explicit solutions methods are used as appropriate. The primary finite element codes used are ABAQUS and ANSYS. User subroutines ...geometric complexities, loading path dependence, rate dependence, and interaction between loading types (electrical, thermal and mechanical). Work is not...are used for specialized material constitutive response. Coupled material responses, such as electrical- thermal for capacitor materials or electrical
Application of multi-grid method on the simulation of incremental forging processes
NASA Astrophysics Data System (ADS)
Ramadan, Mohamad; Khaled, Mahmoud; Fourment, Lionel
2016-10-01
Numerical simulation becomes essential in manufacturing large part by incremental forging processes. It is a splendid tool allowing to show physical phenomena however behind the scenes, an expensive bill should be paid, that is the computational time. That is why many techniques are developed to decrease the computational time of numerical simulation. Multi-Grid method is a numerical procedure that permits to reduce computational time of numerical calculation by performing the resolution of the system of equations on several mesh of decreasing size which allows to smooth faster the low frequency of the solution as well as its high frequency. In this paper a Multi-Grid method is applied to cogging process in the software Forge 3. The study is carried out using increasing number of degrees of freedom. The results shows that calculation time is divide by two for a mesh of 39,000 nodes. The method is promising especially if coupled with Multi-Mesh method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sen, Oishik, E-mail: oishik-sen@uiowa.edu; Gaul, Nicholas J., E-mail: nicholas-gaul@ramdosolutions.com; Choi, K.K., E-mail: kyung-choi@uiowa.edu
Macro-scale computations of shocked particulate flows require closure laws that model the exchange of momentum/energy between the fluid and particle phases. Closure laws are constructed in this work in the form of surrogate models derived from highly resolved mesoscale computations of shock-particle interactions. The mesoscale computations are performed to calculate the drag force on a cluster of particles for different values of Mach Number and particle volume fraction. Two Kriging-based methods, viz. the Dynamic Kriging Method (DKG) and the Modified Bayesian Kriging Method (MBKG) are evaluated for their ability to construct surrogate models with sparse data; i.e. using the leastmore » number of mesoscale simulations. It is shown that if the input data is noise-free, the DKG method converges monotonically; convergence is less robust in the presence of noise. The MBKG method converges monotonically even with noisy input data and is therefore more suitable for surrogate model construction from numerical experiments. This work is the first step towards a full multiscale modeling of interaction of shocked particle laden flows.« less
Space Radiation Transport Methods Development
NASA Technical Reports Server (NTRS)
Wilson, J. W.; Tripathi, R. K.; Qualls, G. D.; Cucinotta, F. A.; Prael, R. E.; Norbury, J. W.; Heinbockel, J. H.; Tweed, J.
2002-01-01
Improved spacecraft shield design requires early entry of radiation constraints into the design process to maximize performance and minimize costs. As a result, we have been investigating high-speed computational procedures to allow shield analysis from the preliminary design concepts to the final design. In particular, we will discuss the progress towards a full three-dimensional and computationally efficient deterministic code for which the current HZETRN evaluates the lowest order asymptotic term. HZETRN is the first deterministic solution to the Boltzmann equation allowing field mapping within the International Space Station (ISS) in tens of minutes using standard Finite Element Method (FEM) geometry common to engineering design practice enabling development of integrated multidisciplinary design optimization methods. A single ray trace in ISS FEM geometry requires 14 milliseconds and severely limits application of Monte Carlo methods to such engineering models. A potential means of improving the Monte Carlo efficiency in coupling to spacecraft geometry is given in terms of reconfigurable computing and could be utilized in the final design as verification of the deterministic method optimized design.
NASA Astrophysics Data System (ADS)
Xu, Kai; Wang, Yiwen; Wang, Yueming; Wang, Fang; Hao, Yaoyao; Zhang, Shaomin; Zhang, Qiaosheng; Chen, Weidong; Zheng, Xiaoxiang
2013-04-01
Objective. The high-dimensional neural recordings bring computational challenges to movement decoding in motor brain machine interfaces (mBMI), especially for portable applications. However, not all recorded neural activities relate to the execution of a certain movement task. This paper proposes to use a local-learning-based method to perform neuron selection for the gesture prediction in a reaching and grasping task. Approach. Nonlinear neural activities are decomposed into a set of linear ones in a weighted feature space. A margin is defined to measure the distance between inter-class and intra-class neural patterns. The weights, reflecting the importance of neurons, are obtained by minimizing a margin-based exponential error function. To find the most dominant neurons in the task, 1-norm regularization is introduced to the objective function for sparse weights, where near-zero weights indicate irrelevant neurons. Main results. The signals of only 10 neurons out of 70 selected by the proposed method could achieve over 95% of the full recording's decoding accuracy of gesture predictions, no matter which different decoding methods are used (support vector machine and K-nearest neighbor). The temporal activities of the selected neurons show visually distinguishable patterns associated with various hand states. Compared with other algorithms, the proposed method can better eliminate the irrelevant neurons with near-zero weights and provides the important neuron subset with the best decoding performance in statistics. The weights of important neurons converge usually within 10-20 iterations. In addition, we study the temporal and spatial variation of neuron importance along a period of one and a half months in the same task. A high decoding performance can be maintained by updating the neuron subset. Significance. The proposed algorithm effectively ascertains the neuronal importance without assuming any coding model and provides a high performance with different decoding models. It shows better robustness of identifying the important neurons with noisy signals presented. The low demand of computational resources which, reflected by the fast convergence, indicates the feasibility of the method applied in portable BMI systems. The ascertainment of the important neurons helps to inspect neural patterns visually associated with the movement task. The elimination of irrelevant neurons greatly reduces the computational burden of mBMI systems and maintains the performance with better robustness.
Development of an Active Flow Control Technique for an Airplane High-Lift Configuration
NASA Technical Reports Server (NTRS)
Shmilovich, Arvin; Yadlin, Yoram; Dickey, Eric D.; Hartwich, Peter M.; Khodadoust, Abdi
2017-01-01
This study focuses on Active Flow Control methods used in conjunction with airplane high-lift systems. The project is motivated by the simplified high-lift system, which offers enhanced airplane performance compared to conventional high-lift systems. Computational simulations are used to guide the implementation of preferred flow control methods, which require a fluidic supply. It is first demonstrated that flow control applied to a high-lift configuration that consists of simple hinge flaps is capable of attaining the performance of the conventional high-lift counterpart. A set of flow control techniques has been subsequently considered to identify promising candidates, where the central requirement is that the mass flow for actuation has to be within available resources onboard. The flow control methods are based on constant blowing, fluidic oscillators, and traverse actuation. The simulations indicate that the traverse actuation offers a substantial reduction in required mass flow, and it is especially effective when the frequency of actuation is consistent with the characteristic time scale of the flow.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
Background: With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. Methods: The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. Results: The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. Conclusion: According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology. PMID:29719309
Research on elastic resource management for multi-queue under cloud computing environment
NASA Astrophysics Data System (ADS)
CHENG, Zhenjing; LI, Haibo; HUANG, Qiulan; Cheng, Yaodong; CHEN, Gang
2017-10-01
As a new approach to manage computing resource, virtualization technology is more and more widely applied in the high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, using HTCondor as the job queue management system. In a traditional static cluster, a fixed number of virtual machines are pre-allocated to the job queue of different experiments. However this method cannot be well adapted to the volatility of computing resource requirements. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed. This system performs unified management of virtual computing nodes on the basis of job queue in HTCondor based on dual resource thresholds as well as the quota service. A two-stage pool is designed to improve the efficiency of resource pool expansion. This paper will present several use cases of the elastic resource management system in IHEPCloud. The practical run shows virtual computing resource dynamically expanded or shrunk while computing requirements change. Additionally, the CPU utilization ratio of computing resource was significantly increased when compared with traditional resource management. The system also has good performance when there are multiple condor schedulers and multiple job queues.
Increasing the computational efficient of digital cross correlation by a vectorization method
NASA Astrophysics Data System (ADS)
Chang, Ching-Yuan; Ma, Chien-Ching
2017-08-01
This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.
NASA Astrophysics Data System (ADS)
Pai, Akshay; Samala, Ravi K.; Zhang, Jianying; Qian, Wei
2010-03-01
Mammography reading by radiologists and breast tissue image interpretation by pathologists often leads to high False Positive (FP) Rates. Similarly, current Computer Aided Diagnosis (CADx) methods tend to concentrate more on sensitivity, thus increasing the FP rates. A novel method is introduced here which employs similarity based method to decrease the FP rate in the diagnosis of microcalcifications. This method employs the Principal Component Analysis (PCA) and the similarity metrics in order to achieve the proposed goal. The training and testing set is divided into generalized (Normal and Abnormal) and more specific (Abnormal, Normal, Benign) classes. The performance of this method as a standalone classification system is evaluated in both the cases (general and specific). In another approach the probability of each case belonging to a particular class is calculated. If the probabilities are too close to classify, the augmented CADx system can be instructed to have a detailed analysis of such cases. In case of normal cases with high probability, no further processing is necessary, thus reducing the computation time. Hence, this novel method can be employed in cascade with CADx to reduce the FP rate and also avoid unnecessary computational time. Using this methodology, a false positive rate of 8% and 11% is achieved for mammography and cellular images respectively.
Real-time Tsunami Inundation Prediction Using High Performance Computers
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2014-12-01
Recently off-shore tsunami observation stations based on cabled ocean bottom pressure gauges are actively being deployed especially in Japan. These cabled systems are designed to provide real-time tsunami data before tsunamis reach coastlines for disaster mitigation purposes. To receive real benefits of these observations, real-time analysis techniques to make an effective use of these data are necessary. A representative study was made by Tsushima et al. (2009) that proposed a method to provide instant tsunami source prediction based on achieving tsunami waveform data. As time passes, the prediction is improved by using updated waveform data. After a tsunami source is predicted, tsunami waveforms are synthesized from pre-computed tsunami Green functions of linear long wave equations. Tsushima et al. (2014) updated the method by combining the tsunami waveform inversion with an instant inversion of coseismic crustal deformation and improved the prediction accuracy and speed in the early stages. For disaster mitigation purposes, real-time predictions of tsunami inundation are also important. In this study, we discuss the possibility of real-time tsunami inundation predictions, which require faster-than-real-time tsunami inundation simulation in addition to instant tsunami source analysis. Although the computational amount is large to solve non-linear shallow water equations for inundation predictions, it has become executable through the recent developments of high performance computing technologies. We conducted parallel computations of tsunami inundation and achieved 6.0 TFLOPS by using 19,000 CPU cores. We employed a leap-frog finite difference method with nested staggered grids of which resolution range from 405 m to 5 m. The resolution ratio of each nested domain was 1/3. Total number of grid points were 13 million, and the time step was 0.1 seconds. Tsunami sources of 2011 Tohoku-oki earthquake were tested. The inundation prediction up to 2 hours after the earthquake occurs took about 2 minutes, which would be sufficient for a practical tsunami inundation predictions. In the presentation, the computational performance of our faster-than-real-time tsunami inundation model will be shown, and preferable tsunami wave source analysis for an accurate inundation prediction will also be discussed.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures
Manolakos, Elias S.
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.
Sharma, Anuj; Manolakos, Elias S
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
NASA Astrophysics Data System (ADS)
Ibrahima, Fayadhoi; Meyer, Daniel; Tchelepi, Hamdi
2016-04-01
Because geophysical data are inexorably sparse and incomplete, stochastic treatments of simulated responses are crucial to explore possible scenarios and assess risks in subsurface problems. In particular, nonlinear two-phase flows in porous media are essential, yet challenging, in reservoir simulation and hydrology. Adding highly heterogeneous and uncertain input, such as the permeability and porosity fields, transforms the estimation of the flow response into a tough stochastic problem for which computationally expensive Monte Carlo (MC) simulations remain the preferred option.We propose an alternative approach to evaluate the probability distribution of the (water) saturation for the stochastic Buckley-Leverett problem when the probability distributions of the permeability and porosity fields are available. We give a computationally efficient and numerically accurate method to estimate the one-point probability density (PDF) and cumulative distribution functions (CDF) of the (water) saturation. The distribution method draws inspiration from a Lagrangian approach of the stochastic transport problem and expresses the saturation PDF and CDF essentially in terms of a deterministic mapping and the distribution and statistics of scalar random fields. In a large class of applications these random fields can be estimated at low computational costs (few MC runs), thus making the distribution method attractive. Even though the method relies on a key assumption of fixed streamlines, we show that it performs well for high input variances, which is the case of interest. Once the saturation distribution is determined, any one-point statistics thereof can be obtained, especially the saturation average and standard deviation. Moreover, the probability of rare events and saturation quantiles (e.g. P10, P50 and P90) can be efficiently derived from the distribution method. These statistics can then be used for risk assessment, as well as data assimilation and uncertainty reduction in the prior knowledge of input distributions. We provide various examples and comparisons with MC simulations to illustrate the performance of the method.
Re-Computation of Numerical Results Contained in NACA Report No. 496
NASA Technical Reports Server (NTRS)
Perry, Boyd, III
2015-01-01
An extensive examination of NACA Report No. 496 (NACA 496), "General Theory of Aerodynamic Instability and the Mechanism of Flutter," by Theodore Theodorsen, is described. The examination included checking equations and solution methods and re-computing interim quantities and all numerical examples in NACA 496. The checks revealed that NACA 496 contains computational shortcuts (time- and effort-saving devices for engineers of the time) and clever artifices (employed in its solution methods), but, unfortunately, also contains numerous tripping points (aspects of NACA 496 that have the potential to cause confusion) and some errors. The re-computations were performed employing the methods and procedures described in NACA 496, but using modern computational tools. With some exceptions, the magnitudes and trends of the original results were in fair-to-very-good agreement with the re-computed results. The exceptions included what are speculated to be computational errors in the original in some instances and transcription errors in the original in others. Independent flutter calculations were performed and, in all cases, including those where the original and re-computed results differed significantly, were in excellent agreement with the re-computed results. Appendix A contains NACA 496; Appendix B contains a Matlab(Reistered) program that performs the re-computation of results; Appendix C presents three alternate solution methods, with examples, for the two-degree-of-freedom solution method of NACA 496; Appendix D contains the three-degree-of-freedom solution method (outlined in NACA 496 but never implemented), with examples.
NASA Astrophysics Data System (ADS)
Kenway, Gaetan K. W.
This thesis presents new tools and techniques developed to address the challenging problem of high-fidelity aerostructural optimization with respect to large numbers of design variables. A new mesh-movement scheme is developed that is both computationally efficient and sufficiently robust to accommodate large geometric design changes and aerostructural deformations. A fully coupled Newton-Krylov method is presented that accelerates the convergence of aerostructural systems and provides a 20% performance improvement over the traditional nonlinear block Gauss-Seidel approach and can handle more exible structures. A coupled adjoint method is used that efficiently computes derivatives for a gradient-based optimization algorithm. The implementation uses only machine accurate derivative techniques and is verified to yield fully consistent derivatives by comparing against the complex step method. The fully-coupled large-scale coupled adjoint solution method is shown to have 30% better performance than the segregated approach. The parallel scalability of the coupled adjoint technique is demonstrated on an Euler Computational Fluid Dynamics (CFD) model with more than 80 million state variables coupled to a detailed structural finite-element model of the wing with more than 1 million degrees of freedom. Multi-point high-fidelity aerostructural optimizations of a long-range wide-body, transonic transport aircraft configuration are performed using the developed techniques. The aerostructural analysis employs Euler CFD with a 2 million cell mesh and a structural finite element model with 300 000 DOF. Two design optimization problems are solved: one where takeoff gross weight is minimized, and another where fuel burn is minimized. Each optimization uses a multi-point formulation with 5 cruise conditions and 2 maneuver conditions. The optimization problems have 476 design variables are optimal results are obtained within 36 hours of wall time using 435 processors. The TOGW minimization results in a 4.2% reduction in TOGW with a 6.6% fuel burn reduction, while the fuel burn optimization resulted in a 11.2% fuel burn reduction with no change to the takeoff gross weight.
Computer-Aided Design and Optimization of High-Performance Vacuum Electronic Devices
2006-08-15
approximations to the metric, and space mapping wherein low-accuracy (coarse mesh) solutions can potentially be used more effectively in an...interface and algorithm development. • Work on space - mapping or related methods for utilizing models of varying levels of approximation within an
Tablet computers in assessing performance in a high stakes exam: opinion matters.
Currie, G P; Sinha, S; Thomson, F; Cleland, J; Denison, A R
2017-06-01
Background Tablet computers have emerged as a tool to capture, process and store data in examinations, yet evidence relating to their acceptability and usefulness in assessment is limited. Methods We performed an observational study to explore opinions and attitudes relating to tablet computer use in recording performance in a final year objective structured clinical examination at a single UK medical school. Examiners completed a short questionnaire encompassing background, forced-choice and open questions. Forced choice questions were analysed using descriptive statistics and open questions by framework analysis. Results Ninety-two (97% response rate) examiners completed the questionnaire of whom 85% had previous use of tablet computers. Ninety per cent felt checklist mark allocation was 'very/quite easy', while approximately half considered recording 'free-type' comments was 'easy/very easy'. Greater overall efficiency of marking and resource savings were considered the main advantages of tablet computers, while concerns relating to technological failure and ability to record free type comments were raised. Discussion In a context where examiners were familiar with tablet computers, they were preferred to paper checklists, although concerns were raised. This study adds to the limited literature underpinning the use of electronic devices as acceptable tools in objective structured clinical examinations.
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Two-step simulation of velocity and passive scalar mixing at high Schmidt number in turbulent jets
NASA Astrophysics Data System (ADS)
Rah, K. Jeff; Blanquart, Guillaume
2016-11-01
Simulation of passive scalar in the high Schmidt number turbulent mixing process requires higher computational cost than that of velocity fields, because the scalar is associated with smaller length scales than velocity. Thus, full simulation of both velocity and passive scalar with high Sc for a practical configuration is difficult to perform. In this work, a new approach to simulate velocity and passive scalar mixing at high Sc is suggested to reduce the computational cost. First, the velocity fields are resolved by Large Eddy Simulation (LES). Then, by extracting the velocity information from LES, the scalar inside a moving fluid blob is simulated by Direct Numerical Simulation (DNS). This two-step simulation method is applied to a turbulent jet and provides a new way to examine a scalar mixing process in a practical application with smaller computational cost. NSF, Samsung Scholarship.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Senate Committee on Commerce, Science, and Transportation.
This hearing before the Senate Subcommittee on Science, Technology, and Space focuses on S. 272, the High-Performance Computing and Communications Act of 1991, a bill that provides for a coordinated federal research and development program to ensure continued U.S. leadership in this area. Performance computing is defined as representing the…
Poussin, Carine; Belcastro, Vincenzo; Martin, Florian; Boué, Stéphanie; Peitsch, Manuel C; Hoeng, Julia
2017-04-17
Systems toxicology intends to quantify the effect of toxic molecules in biological systems and unravel their mechanisms of toxicity. The development of advanced computational methods is required for analyzing and integrating high throughput data generated for this purpose as well as for extrapolating predictive toxicological outcomes and risk estimates. To ensure the performance and reliability of the methods and verify conclusions from systems toxicology data analysis, it is important to conduct unbiased evaluations by independent third parties. As a case study, we report here the results of an independent verification of methods and data in systems toxicology by crowdsourcing. The sbv IMPROVER systems toxicology computational challenge aimed to evaluate computational methods for the development of blood-based gene expression signature classification models with the ability to predict smoking exposure status. Participants created/trained models on blood gene expression data sets including smokers/mice exposed to 3R4F (a reference cigarette) or noncurrent smokers/Sham (mice exposed to air). Participants applied their models on unseen data to predict whether subjects classify closer to smoke-exposed or nonsmoke exposed groups. The data sets also included data from subjects that had been exposed to potential modified risk tobacco products (MRTPs) or that had switched to a MRTP after exposure to conventional cigarette smoke. The scoring of anonymized participants' predictions was done using predefined metrics. The top 3 performers' methods predicted class labels with area under the precision recall scores above 0.9. Furthermore, although various computational approaches were used, the crowd's results confirmed our own data analysis outcomes with regards to the classification of MRTP-related samples. Mice exposed directly to a MRTP were classified closer to the Sham group. After switching to a MRTP, the confidence that subjects belonged to the smoke-exposed group decreased significantly. Smoking exposure gene signatures that contributed to the group separation included a core set of genes highly consistent across teams such as AHRR, LRRN3, SASH1, and P2RY6. In conclusion, crowdsourcing constitutes a pertinent approach, in complement to the classical peer review process, to independently and unbiasedly verify computational methods and data for risk assessment using systems toxicology.
Efficient Stochastic Inversion Using Adjoint Models and Kernel-PCA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thimmisetty, Charanraj A.; Zhao, Wenju; Chen, Xiao
2017-10-18
Performing stochastic inversion on a computationally expensive forward simulation model with a high-dimensional uncertain parameter space (e.g. a spatial random field) is computationally prohibitive even when gradient information can be computed efficiently. Moreover, the ‘nonlinear’ mapping from parameters to observables generally gives rise to non-Gaussian posteriors even with Gaussian priors, thus hampering the use of efficient inversion algorithms designed for models with Gaussian assumptions. In this paper, we propose a novel Bayesian stochastic inversion methodology, which is characterized by a tight coupling between the gradient-based Langevin Markov Chain Monte Carlo (LMCMC) method and a kernel principal component analysis (KPCA). Thismore » approach addresses the ‘curse-of-dimensionality’ via KPCA to identify a low-dimensional feature space within the high-dimensional and nonlinearly correlated parameter space. In addition, non-Gaussian posterior distributions are estimated via an efficient LMCMC method on the projected low-dimensional feature space. We will demonstrate this computational framework by integrating and adapting our recent data-driven statistics-on-manifolds constructions and reduction-through-projection techniques to a linear elasticity model.« less
Entropy-limited hydrodynamics: a novel approach to relativistic hydrodynamics
NASA Astrophysics Data System (ADS)
Guercilena, Federico; Radice, David; Rezzolla, Luciano
2017-07-01
We present entropy-limited hydrodynamics (ELH): a new approach for the computation of numerical fluxes arising in the discretization of hyperbolic equations in conservation form. ELH is based on the hybridisation of an unfiltered high-order scheme with the first-order Lax-Friedrichs method. The activation of the low-order part of the scheme is driven by a measure of the locally generated entropy inspired by the artificial-viscosity method proposed by Guermond et al. (J. Comput. Phys. 230(11):4248-4267, 2011, doi: 10.1016/j.jcp.2010.11.043). Here, we present ELH in the context of high-order finite-differencing methods and of the equations of general-relativistic hydrodynamics. We study the performance of ELH in a series of classical astrophysical tests in general relativity involving isolated, rotating and nonrotating neutron stars, and including a case of gravitational collapse to black hole. We present a detailed comparison of ELH with the fifth-order monotonicity preserving method MP5 (Suresh and Huynh in J. Comput. Phys. 136(1):83-99, 1997, doi: 10.1006/jcph.1997.5745), one of the most common high-order schemes currently employed in numerical-relativity simulations. We find that ELH achieves comparable and, in many of the cases studied here, better accuracy than more traditional methods at a fraction of the computational cost (up to {˜}50% speedup). Given its accuracy and its simplicity of implementation, ELH is a promising framework for the development of new special- and general-relativistic hydrodynamics codes well adapted for massively parallel supercomputers.
Zhou, Guoxu; Yang, Zuyuan; Xie, Shengli; Yang, Jun-Mei
2011-04-01
Online blind source separation (BSS) is proposed to overcome the high computational cost problem, which limits the practical applications of traditional batch BSS algorithms. However, the existing online BSS methods are mainly used to separate independent or uncorrelated sources. Recently, nonnegative matrix factorization (NMF) shows great potential to separate the correlative sources, where some constraints are often imposed to overcome the non-uniqueness of the factorization. In this paper, an incremental NMF with volume constraint is derived and utilized for solving online BSS. The volume constraint to the mixing matrix enhances the identifiability of the sources, while the incremental learning mode reduces the computational cost. The proposed method takes advantage of the natural gradient based multiplication updating rule, and it performs especially well in the recovery of dependent sources. Simulations in BSS for dual-energy X-ray images, online encrypted speech signals, and high correlative face images show the validity of the proposed method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Yan; Mohanty, Soumya D.; Center for Gravitational Wave Astronomy, Department of Physics and Astronomy, University of Texas at Brownsville, 80 Fort Brown, Brownsville, Texas 78520
2010-03-15
The detection and estimation of gravitational wave signals belonging to a parameterized family of waveforms requires, in general, the numerical maximization of a data-dependent function of the signal parameters. Because of noise in the data, the function to be maximized is often highly multimodal with numerous local maxima. Searching for the global maximum then becomes computationally expensive, which in turn can limit the scientific scope of the search. Stochastic optimization is one possible approach to reducing computational costs in such applications. We report results from a first investigation of the particle swarm optimization method in this context. The method ismore » applied to a test bed motivated by the problem of detection and estimation of a binary inspiral signal. Our results show that particle swarm optimization works well in the presence of high multimodality, making it a viable candidate method for further applications in gravitational wave data analysis.« less
The spectral cell method in nonlinear earthquake modeling
NASA Astrophysics Data System (ADS)
Giraldo, Daniel; Restrepo, Doriam
2017-12-01
This study examines the applicability of the spectral cell method (SCM) to compute the nonlinear earthquake response of complex basins. SCM combines fictitious-domain concepts with the spectral-version of the finite element method to solve the wave equations in heterogeneous geophysical domains. Nonlinear behavior is considered by implementing the Mohr-Coulomb and Drucker-Prager yielding criteria. We illustrate the performance of SCM with numerical examples of nonlinear basins exhibiting physically and computationally challenging conditions. The numerical experiments are benchmarked with results from overkill solutions, and using MIDAS GTS NX, a finite element software for geotechnical applications. Our findings show good agreement between the two sets of results. Traditional spectral elements implementations allow points per wavelength as low as PPW = 4.5 for high-order polynomials. Our findings show that in the presence of nonlinearity, high-order polynomials (p ≥ 3) require mesh resolutions above of PPW ≥ 10 to ensure displacement errors below 10%.