Facilities | Integrated Energy Solutions | NREL
strategies needed to optimize our entire energy system. A photo of the high-performance computer at NREL . High-Performance Computing Data Center High-performance computing facilities at NREL provide high-speed
High-Performance Computing Data Center Warm-Water Liquid Cooling |
Computational Science | NREL Warm-Water Liquid Cooling High-Performance Computing Data Center Warm-Water Liquid Cooling NREL's High-Performance Computing Data Center (HPC Data Center) is liquid water Liquid cooling technologies offer a more energy-efficient solution that also allows for effective
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Future computing platforms for science in a power constrained era
Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...
2015-12-23
Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less
Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo
2018-06-08
Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, Bryan Scott; Gough, Sean T.
This report documents a validation of the MCNP6 Version 1.0 computer code on the high performance computing platform Moonlight, for operations at Los Alamos National Laboratory (LANL) that involve plutonium metals, oxides, and solutions. The validation is conducted using the ENDF/B-VII.1 continuous energy group cross section library at room temperature. The results are for use by nuclear criticality safety personnel in performing analysis and evaluation of various facility activities involving plutonium materials.
Computer Facilitated Mathematical Methods in Chemical Engineering--Similarity Solution
ERIC Educational Resources Information Center
Subramanian, Venkat R.
2006-01-01
High-performance computers coupled with highly efficient numerical schemes and user-friendly software packages have helped instructors to teach numerical solutions and analysis of various nonlinear models more efficiently in the classroom. One of the main objectives of a model is to provide insight about the system of interest. Analytical…
NASA Center for Climate Simulation (NCCS) Advanced Technology AT5 Virtualized Infiniband Report
NASA Technical Reports Server (NTRS)
Thompson, John H.; Bledsoe, Benjamin C.; Wagner, Mark; Shakshober, John; Fromkin, Russ
2013-01-01
The NCCS is part of the Computational and Information Sciences and Technology Office (CISTO) of Goddard Space Flight Center's (GSFC) Sciences and Exploration Directorate. The NCCS's mission is to enable scientists to increase their understanding of the Earth, the solar system, and the universe by supplying state-of-the-art high performance computing (HPC) solutions. To accomplish this mission, the NCCS (https://www.nccs.nasa.gov) provides high performance compute engines, mass storage, and network solutions to meet the specialized needs of the Earth and space science user communities
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Computational Investigation of the Performance and Back-Pressure Limits of a Hypersonic Inlet
NASA Technical Reports Server (NTRS)
Smart, Michael K.; White, Jeffery A.
2002-01-01
A computational analysis of Mach 6.2 operation of a hypersonic inlet with rectangular-to-elliptical shape transition has been performed. The results of the computations are compared with experimental data for cases with and without a manually imposed back-pressure. While the no-back-pressure numerical solutions match the general trends of the data, certain features observed in the experiments did not appear in the computational solutions. The reasons for these discrepancies are discussed and possible remedies are suggested. Most importantly, however, the computational analysis increased the understanding of the consequences of certain aspects of the inlet design. This will enable the performance of future inlets of this class to be improved. Computational solutions with back-pressure under-estimated the back-pressure limit observed in the experiments, but did supply significant insight into the character of highly back-pressured inlet flows.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Computer-Aided Design of Drugs on Emerging Hybrid High Performance Computers
2013-09-01
solutions to virtualization include lightweight, user-level implementations on Linux operating systems , but these solutions are often dependent on a...virtualization include lightweight, user-level implementations on Linux operating systems , but these solutions are often dependent on a specific version of...Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA, 22202-4302
Zhu, Liping; Aono, Masashi; Kim, Song-Ju; Hara, Masahiko
2013-04-01
A single-celled, multi-nucleated amoeboid organism, a plasmodium of the true slime mold Physarum polycephalum, can perform sophisticated computing by exhibiting complex spatiotemporal oscillatory dynamics while deforming its amorphous body. We previously devised an "amoeba-based computer (ABC)" to quantitatively evaluate the optimization capability of the amoeboid organism in searching for a solution to the traveling salesman problem (TSP) under optical feedback control. In ABC, the organism changes its shape to find a high quality solution (a relatively shorter TSP route) by alternately expanding and contracting its pseudopod-like branches that exhibit local photoavoidance behavior. The quality of the solution serves as a measure of the optimality of which the organism maximizes its global body area (nutrient absorption) while minimizing the risk of being illuminated (exposure to aversive stimuli). ABC found a high quality solution for the 8-city TSP with a high probability. However, it remains unclear whether intracellular communication among the branches of the organism is essential for computing. In this study, we conducted a series of control experiments using two individual cells (two single-celled organisms) to perform parallel searches in the absence of intercellular communication. We found that ABC drastically lost its ability to find a solution when it used two independent individuals. However, interestingly, when two individuals were prepared by dividing one individual, they found a solution for a few tens of minutes. That is, the two divided individuals remained correlated even though they were spatially separated. These results suggest the presence of a long-term memory in the intrinsic dynamics of this organism and its significance in performing sophisticated computing. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
High throughput computing: a solution for scientific analysis
O'Donnell, M.
2011-01-01
handle job failures due to hardware, software, or network interruptions (obviating the need to manually resubmit the job after each stoppage); be affordable; and most importantly, allow us to complete very large, complex analyses that otherwise would not even be possible. In short, we envisioned a job-management system that would take advantage of unused FORT CPUs within a local area network (LAN) to effectively distribute and run highly complex analytical processes. What we found was a solution that uses High Throughput Computing (HTC) and High Performance Computing (HPC) systems to do exactly that (Figure 1).
Webinar: Delivering Transformational HPC Solutions to Industry
Streitz, Frederick
2018-01-16
Dr. Frederick Streitz, director of the High Performance Computing Innovation Center, discusses Lawrence Livermore National Laboratory computational capabilities and expertise available to industry in this webinar.
Engineering and Computing Portal to Solve Environmental Problems
NASA Astrophysics Data System (ADS)
Gudov, A. M.; Zavozkin, S. Y.; Sotnikov, I. Y.
2018-01-01
This paper describes architecture and services of the Engineering and Computing Portal, which is considered to be a complex solution that provides access to high-performance computing resources, enables to carry out computational experiments, teach parallel technologies and solve computing tasks, including technogenic safety ones.
NASA Astrophysics Data System (ADS)
Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek
2009-09-01
High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
A single-stage flux-corrected transport algorithm for high-order finite-volume methods
Chaplin, Christopher; Colella, Phillip
2017-05-08
We present a new limiter method for solving the advection equation using a high-order, finite-volume discretization. The limiter is based on the flux-corrected transport algorithm. Here, we modify the classical algorithm by introducing a new computation for solution bounds at smooth extrema, as well as improving the preconstraint on the high-order fluxes. We compute the high-order fluxes via a method-of-lines approach with fourth-order Runge-Kutta as the time integrator. For computing low-order fluxes, we select the corner-transport upwind method due to its improved stability over donor-cell upwind. Several spatial differencing schemes are investigated for the high-order flux computation, including centered- differencemore » and upwind schemes. We show that the upwind schemes perform well on account of the dissipation of high-wavenumber components. The new limiter method retains high-order accuracy for smooth solutions and accurately captures fronts in discontinuous solutions. Further, we need only apply the limiter once per complete time step.« less
Rapid solution of large-scale systems of equations
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
NASA Astrophysics Data System (ADS)
Xue, Xinwei; Cheryauka, Arvi; Tubbs, David
2006-03-01
CT imaging in interventional and minimally-invasive surgery requires high-performance computing solutions that meet operational room demands, healthcare business requirements, and the constraints of a mobile C-arm system. The computational requirements of clinical procedures using CT-like data are increasing rapidly, mainly due to the need for rapid access to medical imagery during critical surgical procedures. The highly parallel nature of Radon transform and CT algorithms enables embedded computing solutions utilizing a parallel processing architecture to realize a significant gain of computational intensity with comparable hardware and program coding/testing expenses. In this paper, using a sample 2D and 3D CT problem, we explore the programming challenges and the potential benefits of embedded computing using commodity hardware components. The accuracy and performance results obtained on three computational platforms: a single CPU, a single GPU, and a solution based on FPGA technology have been analyzed. We have shown that hardware-accelerated CT image reconstruction can be achieved with similar levels of noise and clarity of feature when compared to program execution on a CPU, but gaining a performance increase at one or more orders of magnitude faster. 3D cone-beam or helical CT reconstruction and a variety of volumetric image processing applications will benefit from similar accelerations.
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
SISYPHUS: A high performance seismic inversion factory
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas
2016-04-01
In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
Design and Analysis of Self-Adapted Task Scheduling Strategies in Wireless Sensor Networks
Guo, Wenzhong; Xiong, Naixue; Chao, Han-Chieh; Hussain, Sajid; Chen, Guolong
2011-01-01
In a wireless sensor network (WSN), the usage of resources is usually highly related to the execution of tasks which consume a certain amount of computing and communication bandwidth. Parallel processing among sensors is a promising solution to provide the demanded computation capacity in WSNs. Task allocation and scheduling is a typical problem in the area of high performance computing. Although task allocation and scheduling in wired processor networks has been well studied in the past, their counterparts for WSNs remain largely unexplored. Existing traditional high performance computing solutions cannot be directly implemented in WSNs due to the limitations of WSNs such as limited resource availability and the shared communication medium. In this paper, a self-adapted task scheduling strategy for WSNs is presented. First, a multi-agent-based architecture for WSNs is proposed and a mathematical model of dynamic alliance is constructed for the task allocation problem. Then an effective discrete particle swarm optimization (PSO) algorithm for the dynamic alliance (DPSO-DA) with a well-designed particle position code and fitness function is proposed. A mutation operator which can effectively improve the algorithm’s ability of global search and population diversity is also introduced in this algorithm. Finally, the simulation results show that the proposed solution can achieve significant better performance than other algorithms. PMID:22163971
Linear static structural and vibration analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.
1993-01-01
Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
NASA Astrophysics Data System (ADS)
Hussein, I.; Wilkins, M.; Roscoe, C.; Faber, W.; Chakravorty, S.; Schumacher, P.
2016-09-01
Finite Set Statistics (FISST) is a rigorous Bayesian multi-hypothesis management tool for the joint detection, classification and tracking of multi-sensor, multi-object systems. Implicit within the approach are solutions to the data association and target label-tracking problems. The full FISST filtering equations, however, are intractable. While FISST-based methods such as the PHD and CPHD filters are tractable, they require heavy moment approximations to the full FISST equations that result in a significant loss of information contained in the collected data. In this paper, we review Smart Sampling Markov Chain Monte Carlo (SSMCMC) that enables FISST to be tractable while avoiding moment approximations. We study the effect of tuning key SSMCMC parameters on tracking quality and computation time. The study is performed on a representative space object catalog with varying numbers of RSOs. The solution is implemented in the Scala computing language at the Maui High Performance Computing Center (MHPCC) facility.
Implementing direct, spatially isolated problems on transputer networks
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
NASA Astrophysics Data System (ADS)
Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.
2017-06-01
Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Towards New Metrics for High-Performance Computing Resilience
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Ashraf, Rizwan A; Engelmann, Christian
Ensuring the reliability of applications is becoming an increasingly important challenge as high-performance computing (HPC) systems experience an ever-growing number of faults, errors and failures. While the HPC community has made substantial progress in developing various resilience solutions, it continues to rely on platform-based metrics to quantify application resiliency improvements. The resilience of an HPC application is concerned with the reliability of the application outcome as well as the fault handling efficiency. To understand the scope of impact, effective coverage and performance efficiency of existing and emerging resilience solutions, there is a need for new metrics. In this paper, wemore » develop new ways to quantify resilience that consider both the reliability and the performance characteristics of the solutions from the perspective of HPC applications. As HPC systems continue to evolve in terms of scale and complexity, it is expected that applications will experience various types of faults, errors and failures, which will require applications to apply multiple resilience solutions across the system stack. The proposed metrics are intended to be useful for understanding the combined impact of these solutions on an application's ability to produce correct results and to evaluate their overall impact on an application's performance in the presence of various modes of faults.« less
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Petascale supercomputing to accelerate the design of high-temperature alloys
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; ...
2017-10-25
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys
NASA Astrophysics Data System (ADS)
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; Haynes, J. Allen
2017-12-01
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ‧-Al2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviour of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. The approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Task Assignment Heuristics for Distributed CFD Applications
NASA Technical Reports Server (NTRS)
Lopez-Benitez, N.; Djomehri, M. J.; Biswas, R.; Biegel, Bryan (Technical Monitor)
2001-01-01
CFD applications require high-performance computational platforms: 1. Complex physics and domain configuration demand strongly coupled solutions; 2. Applications are CPU and memory intensive; and 3. Huge resource requirements can only be satisfied by teraflop-scale machines or distributed computing.
CORDIC-based digital signal processing (DSP) element for adaptive signal processing
NASA Astrophysics Data System (ADS)
Bolstad, Gregory D.; Neeld, Kenneth B.
1995-04-01
The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)
2002-01-01
The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Generic Hypersonic Inlet Module Analysis
NASA Technical Reports Server (NTRS)
Cockrell, Chares E., Jr.; Huebner, Lawrence D.
2004-01-01
A computational study associated with an internal inlet drag analysis was performed for a generic hypersonic inlet module. The purpose of this study was to determine the feasibility of computing the internal drag force for a generic scramjet engine module using computational methods. The computational study consisted of obtaining two-dimensional (2D) and three-dimensional (3D) computational fluid dynamics (CFD) solutions using the Euler and parabolized Navier-Stokes (PNS) equations. The solution accuracy was assessed by comparisons with experimental pitot pressure data. The CFD analysis indicates that the 3D PNS solutions show the best agreement with experimental pitot pressure data. The internal inlet drag analysis consisted of obtaining drag force predictions based on experimental data and 3D CFD solutions. A comparative assessment of each of the drag prediction methods is made and the sensitivity of CFD drag values to computational procedures is documented. The analysis indicates that the CFD drag predictions are highly sensitive to the computational procedure used.
Quantum Accelerators for High-performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.
We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
HPC on Competitive Cloud Resources
NASA Astrophysics Data System (ADS)
Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff
Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.
Bayer Digester Optimization Studies using Computer Techniques
NASA Astrophysics Data System (ADS)
Kotte, Jan J.; Schleider, Victor H.
Theoretically required heat transfer performance by the multistaged flash heat reclaim system of a high pressure Bayer digester unit is determined for various conditions of discharge temperature, excess flash vapor and indirect steam addition. Solution of simultaneous heat balances around the digester vessels and the heat reclaim system yields the magnitude of available heat for representation of each case on a temperature-enthalpy diagram, where graphical fit of the number of flash stages fixes the heater requirements. Both the heat balances and the trial-and-error graphical solution are adapted to solution by digital computer techniques.
Eisenbach, Markus
2017-01-01
A major impediment to deploying next-generation high-performance computational systems is the required electrical power, often measured in units of megawatts. The solution to this problem is driving the introduction of novel machine architectures, such as those employing many-core processors and specialized accelerators. In this article, we describe the use of a hybrid accelerated architecture to achieve both reduced time to solution and the associated reduction in the electrical cost for a state-of-the-art materials science computation.
An evaluation of superminicomputers for thermal analysis
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Vidal, J. B.; Jones, G. K.
1962-01-01
The feasibility and cost effectiveness of solving thermal analysis problems on superminicomputers is demonstrated. Conventional thermal analysis and the changing computer environment, computer hardware and software used, six thermal analysis test problems, performance of superminicomputers (CPU time, accuracy, turnaround, and cost) and comparison with large computers are considered. Although the CPU times for superminicomputers were 15 to 30 times greater than the fastest mainframe computer, the minimum cost to obtain the solutions on superminicomputers was from 11 percent to 59 percent of the cost of mainframe solutions. The turnaround (elapsed) time is highly dependent on the computer load, but for large problems, superminicomputers produced results in less elapsed time than a typically loaded mainframe computer.
National research and education network
NASA Technical Reports Server (NTRS)
Villasenor, Tony
1991-01-01
Some goals of this network are as follows: Extend U.S. technological leadership in high performance computing and computer communications; Provide wide dissemination and application of the technologies both to the speed and the pace of innovation and to serve the national economy, national security, education, and the global environment; and Spur gains in the U.S. productivity and industrial competitiveness by making high performance computing and networking technologies an integral part of the design and production process. Strategies for achieving these goals are as follows: Support solutions to important scientific and technical challenges through a vigorous R and D effort; Reduce the uncertainties to industry for R and D and use of this technology through increased cooperation between government, industry, and universities and by the continued use of government and government funded facilities as a prototype user for early commercial HPCC products; and Support underlying research, network, and computational infrastructures on which U.S. high performance computing technology is based.
Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code
NASA Technical Reports Server (NTRS)
Spiegel, Seth C.; DeBonis, James R.; Huynh, H. T.
2016-01-01
A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Yier
As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less
Using SRAM Based FPGAs for Power-Aware High Performance Wireless Sensor Networks
Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa
2012-01-01
While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today’s applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements. PMID:22736971
Using SRAM based FPGAs for power-aware high performance wireless sensor networks.
Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa
2012-01-01
While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today's applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Algorithm for fast event parameters estimation on GEM acquired data
NASA Astrophysics Data System (ADS)
Linczuk, Paweł; Krawczyk, Rafał D.; Poźniak, Krzysztof T.; Kasprowicz, Grzegorz; Wojeński, Andrzej; Chernyshova, Maryna; Czarski, Tomasz
2016-09-01
We present study of a software-hardware environment for developing fast computation with high throughput and low latency methods, which can be used as back-end in High Energy Physics (HEP) and other High Performance Computing (HPC) systems, based on high amount of input from electronic sensor based front-end. There is a parallelization possibilities discussion and testing on Intel HPC solutions with consideration of applications with Gas Electron Multiplier (GEM) measurement systems presented in this paper.
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
High performance GPU processing for inversion using uniform grid searches
NASA Astrophysics Data System (ADS)
Venetis, Ioannis E.; Saltogianni, Vasso; Stiros, Stathis; Gallopoulos, Efstratios
2017-04-01
Many geophysical problems are described by systems of redundant, highly non-linear systems of ordinary equations with constant terms deriving from measurements and hence representing stochastic variables. Solution (inversion) of such problems is based on numerical, optimization methods, based on Monte Carlo sampling or on exhaustive searches in cases of two or even three "free" unknown variables. Recently the TOPological INVersion (TOPINV) algorithm, a grid search-based technique in the Rn space, has been proposed. TOPINV is not based on the minimization of a certain cost function and involves only forward computations, hence avoiding computational errors. The basic concept is to transform observation equations into inequalities on the basis of an optimization parameter k and of their standard errors, and through repeated "scans" of n-dimensional search grids for decreasing values of k to identify the optimal clusters of gridpoints which satisfy observation inequalities and by definition contain the "true" solution. Stochastic optimal solutions and their variance-covariance matrices are then computed as first and second statistical moments. Such exhaustive uniform searches produce an excessive computational load and are extremely time consuming for common computers based on a CPU. An alternative is to use a computing platform based on a GPU, which nowadays is affordable to the research community, which provides a much higher computing performance. Using the CUDA programming language to implement TOPINV allows the investigation of the attained speedup in execution time on such a high performance platform. Based on synthetic data we compared the execution time required for two typical geophysical problems, modeling magma sources and seismic faults, described with up to 18 unknown variables, on both CPU/FORTRAN and GPU/CUDA platforms. The same problems for several different sizes of search grids (up to 1012 gridpoints) and numbers of unknown variables were solved on both platforms, and execution time as a function of the grid dimension for each problem was recorded. Results indicate an average speedup in calculations by a factor of 100 on the GPU platform; for example problems with 1012 grid-points require less than two hours instead of several days on conventional desktop computers. Such a speedup encourages the application of TOPINV on high performance platforms, as a GPU, in cases where nearly real time decisions are necessary, for example finite fault modeling to identify possible tsunami sources.
NASA Astrophysics Data System (ADS)
Lucas-Serrano, A.; Font, J. A.; Ibáñez, J. M.; Martí, J. M.
2004-12-01
We assess the suitability of a recent high-resolution central scheme developed by \\cite{kurganov} for the solution of the relativistic hydrodynamic equations. The novelty of this approach relies on the absence of Riemann solvers in the solution procedure. The computations we present are performed in one and two spatial dimensions in Minkowski spacetime. Standard numerical experiments such as shock tubes and the relativistic flat-faced step test are performed. As an astrophysical application the article includes two-dimensional simulations of the propagation of relativistic jets using both Cartesian and cylindrical coordinates. The simulations reported clearly show the capabilities of the numerical scheme of yielding satisfactory results, with an accuracy comparable to that obtained by the so-called high-resolution shock-capturing schemes based upon Riemann solvers (Godunov-type schemes), even well inside the ultrarelativistic regime. Such a central scheme can be straightforwardly applied to hyperbolic systems of conservation laws for which the characteristic structure is not explicitly known, or in cases where a numerical computation of the exact solution of the Riemann problem is prohibitively expensive. Finally, we present comparisons with results obtained using various Godunov-type schemes as well as with those obtained using other high-resolution central schemes which have recently been reported in the literature.
PETSc Users Manual Revision 3.7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, Satish; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
PETSc Users Manual Revision 3.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
NASA Technical Reports Server (NTRS)
Prisbell, Andrew; Marichalar, J.; Lumpkin, F.; LeBeau, G.
2010-01-01
Plume impingement effects on the Orion Crew Service Module (CSM) were analyzed for various dual Reaction Control System (RCS) engine firings and various configurations of the solar arrays. The study was performed using a decoupled computational fluid dynamics (CFD) and Direct Simulation Monte Carlo (DSMC) approach. This approach included a single jet plume solution for the R1E RCS engine computed with the General Aerodynamic Simulation Program (GASP) CFD code. The CFD solution was used to create an inflow surface for the DSMC solution based on the Bird continuum breakdown parameter. The DSMC solution was then used to model the dual RCS plume impingement effects on the entire CSM geometry with deployed solar arrays. However, because the continuum breakdown parameter of 0.5 could not be achieved due to geometrical constraints and because high resolution in the plume shock interaction region is desired, a focused DSMC simulation modeling only the plumes and the shock interaction region was performed. This high resolution intermediate solution was then used as the inflow to the larger DSMC solution to obtain plume impingement heating, forces, and moments on the CSM and the solar arrays for a total of 21 cases that were analyzed. The results of these simulations were used to populate the Orion CSM Aerothermal Database.
Computation in Dynamically Bounded Asymmetric Systems
Rutishauser, Ueli; Slotine, Jean-Jacques; Douglas, Rodney
2015-01-01
Previous explanations of computations performed by recurrent networks have focused on symmetrically connected saturating neurons and their convergence toward attractors. Here we analyze the behavior of asymmetrical connected networks of linear threshold neurons, whose positive response is unbounded. We show that, for a wide range of parameters, this asymmetry brings interesting and computationally useful dynamical properties. When driven by input, the network explores potential solutions through highly unstable ‘expansion’ dynamics. This expansion is steered and constrained by negative divergence of the dynamics, which ensures that the dimensionality of the solution space continues to reduce until an acceptable solution manifold is reached. Then the system contracts stably on this manifold towards its final solution trajectory. The unstable positive feedback and cross inhibition that underlie expansion and divergence are common motifs in molecular and neuronal networks. Therefore we propose that very simple organizational constraints that combine these motifs can lead to spontaneous computation and so to the spontaneous modification of entropy that is characteristic of living systems. PMID:25617645
NASA Technical Reports Server (NTRS)
Marshall, F. J.; Deffenbaugh, F. D.
1974-01-01
A method is developed to determine the flow field of a body of revolution in separated flow. The computer was used to integrate various solutions and solution properties of the sub-flow fields which made up the entire flow field without resorting to a finite difference solution to the complete Navier-Stokes equations. The technique entails the use of the unsteady cross flow analogy and a new solution to the two-dimensional unsteady separated flow problem based upon an unsteady, discrete-vorticity wake. Data for the forces and moments on aerodynamic bodies at low speeds and high angle of attack (outside the range of linear inviscid theories) such that the flow is substantially separated are produced which compare well with experimental data. In addition, three dimensional steady separated regions and wake vortex patterns are determined. The computer program developed to perform the numerical calculations is described.
NASA Technical Reports Server (NTRS)
Ghaffari, Farhad
1999-01-01
Unstructured grid Euler computations, performed at supersonic cruise speed, are presented for a High Speed Civil Transport (HSCT) configuration, designated as the Technology Concept Airplane (TCA) within the High Speed Research (HSR) Program. The numerical results are obtained for the complete TCA cruise configuration which includes the wing, fuselage, empennage, diverters, and flow through nacelles at M (sub infinity) = 2.4 for a range of angles-of-attack and sideslip. Although all the present computations are performed for the complete TCA configuration, appropriate assumptions derived from the fundamental supersonic aerodynamic principles have been made to extract aerodynamic predictions to complement the experimental data obtained from a 1.675%-scaled truncated (aft fuselage/empennage components removed) TCA model. The validity of the computational results, derived from the latter assumptions, are thoroughly addressed and discussed in detail. The computed surface and off-surface flow characteristics are analyzed and the pressure coefficient contours on the wing lower surface are shown to correlate reasonably well with the available pressure sensitive paint results, particularly, for the complex flow structures around the nacelles. The predicted longitudinal and lateral/directional performance characteristics for the truncated TCA configuration are shown to correlate very well with the corresponding wind-tunnel data across the examined range of angles-of-attack and sideslip. The complementary computational results for the longitudinal and lateral/directional performance characteristics for the complete TCA configuration are also presented along with the aerodynamic effects due to empennage components. Results are also presented to assess the computational method performance, solution sensitivity to grid refinement, and solution convergence characteristics.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qishi; Zhu, Mengxia; Rao, Nageswara S
We propose an intelligent decision support system based on sensor and computer networks that incorporates various component techniques for sensor deployment, data routing, distributed computing, and information fusion. The integrated system is deployed in a distributed environment composed of both wireless sensor networks for data collection and wired computer networks for data processing in support of homeland security defense. We present the system framework and formulate the analytical problems and develop approximate or exact solutions for the subtasks: (i) sensor deployment strategy based on a two-dimensional genetic algorithm to achieve maximum coverage with cost constraints; (ii) data routing scheme tomore » achieve maximum signal strength with minimum path loss, high energy efficiency, and effective fault tolerance; (iii) network mapping method to assign computing modules to network nodes for high-performance distributed data processing; and (iv) binary decision fusion rule that derive threshold bounds to improve system hit rate and false alarm rate. These component solutions are implemented and evaluated through either experiments or simulations in various application scenarios. The extensive results demonstrate that these component solutions imbue the integrated system with the desirable and useful quality of intelligence in decision making.« less
NASA Technical Reports Server (NTRS)
DeBonis, James R.
2013-01-01
A computational fluid dynamics code that solves the compressible Navier-Stokes equations was applied to the Taylor-Green vortex problem to examine the code s ability to accurately simulate the vortex decay and subsequent turbulence. The code, WRLES (Wave Resolving Large-Eddy Simulation), uses explicit central-differencing to compute the spatial derivatives and explicit Low Dispersion Runge-Kutta methods for the temporal discretization. The flow was first studied and characterized using Bogey & Bailley s 13-point dispersion relation preserving (DRP) scheme. The kinetic energy dissipation rate, computed both directly and from the enstrophy field, vorticity contours, and the energy spectra are examined. Results are in excellent agreement with a reference solution obtained using a spectral method and provide insight into computations of turbulent flows. In addition the following studies were performed: a comparison of 4th-, 8th-, 12th- and DRP spatial differencing schemes, the effect of the solution filtering on the results, the effect of large-eddy simulation sub-grid scale models, and the effect of high-order discretization of the viscous terms.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
hp-Adaptive time integration based on the BDF for viscous flows
NASA Astrophysics Data System (ADS)
Hay, A.; Etienne, S.; Pelletier, D.; Garon, A.
2015-06-01
This paper presents a procedure based on the Backward Differentiation Formulas of order 1 to 5 to obtain efficient time integration of the incompressible Navier-Stokes equations. The adaptive algorithm performs both stepsize and order selections to control respectively the solution accuracy and the computational efficiency of the time integration process. The stepsize selection (h-adaptivity) is based on a local error estimate and an error controller to guarantee that the numerical solution accuracy is within a user prescribed tolerance. The order selection (p-adaptivity) relies on the idea that low-accuracy solutions can be computed efficiently by low order time integrators while accurate solutions require high order time integrators to keep computational time low. The selection is based on a stability test that detects growing numerical noise and deems a method of order p stable if there is no method of lower order that delivers the same solution accuracy for a larger stepsize. Hence, it guarantees both that (1) the used method of integration operates inside of its stability region and (2) the time integration procedure is computationally efficient. The proposed time integration procedure also features a time-step rejection and quarantine mechanisms, a modified Newton method with a predictor and dense output techniques to compute solution at off-step points.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Huang, Zhenyu; Chavarría-Miranda, Daniel
Contingency analysis is a key function in the Energy Management System (EMS) to assess the impact of various combinations of power system component failures based on state estimation. Contingency analysis is also extensively used in power market operation for feasibility test of market solutions. High performance computing holds the promise of faster analysis of more contingency cases for the purpose of safe and reliable operation of today’s power grids with less operating margin and more intermittent renewable energy sources. This paper evaluates the performance of counter-based dynamic load balancing schemes for massive contingency analysis under different computing environments. Insights frommore » the performance evaluation can be used as guidance for users to select suitable schemes in the application of massive contingency analysis. Case studies, as well as MATLAB simulations, of massive contingency cases using the Western Electricity Coordinating Council power grid model are presented to illustrate the application of high performance computing with counter-based dynamic load balancing schemes.« less
Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells
2007-06-01
Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
Heterogeneous Distributed Computing for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy S.
1998-01-01
The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Closed-form solutions of performability. [in computer systems
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1982-01-01
It is noted that if computing system performance is degradable then system evaluation must deal simultaneously with aspects of both performance and reliability. One approach is the evaluation of a system's performability which, relative to a specified performance variable Y, generally requires solution of the probability distribution function of Y. The feasibility of closed-form solutions of performability when Y is continuous are examined. In particular, the modeling of a degradable buffer/multiprocessor system is considered whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. Employing an approximate decomposition of the model, it is shown that a closed-form solution can indeed be obtained.
Metal oxide resistive random access memory based synaptic devices for brain-inspired computing
NASA Astrophysics Data System (ADS)
Gao, Bin; Kang, Jinfeng; Zhou, Zheng; Chen, Zhe; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan
2016-04-01
The traditional Boolean computing paradigm based on the von Neumann architecture is facing great challenges for future information technology applications such as big data, the Internet of Things (IoT), and wearable devices, due to the limited processing capability issues such as binary data storage and computing, non-parallel data processing, and the buses requirement between memory units and logic units. The brain-inspired neuromorphic computing paradigm is believed to be one of the promising solutions for realizing more complex functions with a lower cost. To perform such brain-inspired computing with a low cost and low power consumption, novel devices for use as electronic synapses are needed. Metal oxide resistive random access memory (ReRAM) devices have emerged as the leading candidate for electronic synapses. This paper comprehensively addresses the recent work on the design and optimization of metal oxide ReRAM-based synaptic devices. A performance enhancement methodology and optimized operation scheme to achieve analog resistive switching and low-energy training behavior are provided. A three-dimensional vertical synapse network architecture is proposed for high-density integration and low-cost fabrication. The impacts of the ReRAM synaptic device features on the performances of neuromorphic systems are also discussed on the basis of a constructed neuromorphic visual system with a pattern recognition function. Possible solutions to achieve the high recognition accuracy and efficiency of neuromorphic systems are presented.
A Subspace Pursuit–based Iterative Greedy Hierarchical Solution to the Neuromagnetic Inverse Problem
Babadi, Behtash; Obregon-Henao, Gabriel; Lamus, Camilo; Hämäläinen, Matti S.; Brown, Emery N.; Purdon, Patrick L.
2013-01-01
Magnetoencephalography (MEG) is an important non-invasive method for studying activity within the human brain. Source localization methods can be used to estimate spatiotemporal activity from MEG measurements with high temporal resolution, but the spatial resolution of these estimates is poor due to the ill-posed nature of the MEG inverse problem. Recent developments in source localization methodology have emphasized temporal as well as spatial constraints to improve source localization accuracy, but these methods can be computationally intense. Solutions emphasizing spatial sparsity hold tremendous promise, since the underlying neurophysiological processes generating MEG signals are often sparse in nature, whether in the form of focal sources, or distributed sources representing large-scale functional networks. Recent developments in the theory of compressed sensing (CS) provide a rigorous framework to estimate signals with sparse structure. In particular, a class of CS algorithms referred to as greedy pursuit algorithms can provide both high recovery accuracy and low computational complexity. Greedy pursuit algorithms are difficult to apply directly to the MEG inverse problem because of the high-dimensional structure of the MEG source space and the high spatial correlation in MEG measurements. In this paper, we develop a novel greedy pursuit algorithm for sparse MEG source localization that overcomes these fundamental problems. This algorithm, which we refer to as the Subspace Pursuit-based Iterative Greedy Hierarchical (SPIGH) inverse solution, exhibits very low computational complexity while achieving very high localization accuracy. We evaluate the performance of the proposed algorithm using comprehensive simulations, as well as the analysis of human MEG data during spontaneous brain activity and somatosensory stimuli. These studies reveal substantial performance gains provided by the SPIGH algorithm in terms of computational complexity, localization accuracy, and robustness. PMID:24055554
ERIC Educational Resources Information Center
Olugbenga, Aiyedun Emmanuel
2016-01-01
Creative Arts is a core and compulsory subject in Nigerian upper basic classes, but the students' performance over the years indicated high failure. Instructional strategies play a pivotal role in improving students' performance. Computer-based instructions such as animated drawings could be a possible solution. This research adopted the design…
SCEAPI: A unified Restful Web API for High-Performance Computing
NASA Astrophysics Data System (ADS)
Rongqiang, Cao; Haili, Xiao; Shasha, Lu; Yining, Zhao; Xiaoning, Wang; Xuebin, Chi
2017-10-01
The development of scientific computing is increasingly moving to collaborative web and mobile applications. All these applications need high-quality programming interface for accessing heterogeneous computing resources consisting of clusters, grid computing or cloud computing. In this paper, we introduce our high-performance computing environment that integrates computing resources from 16 HPC centers across China. Then we present a bundle of web services called SCEAPI and describe how it can be used to access HPC resources with HTTP or HTTPs protocols. We discuss SCEAPI from several aspects including architecture, implementation and security, and address specific challenges in designing compatible interfaces and protecting sensitive data. We describe the functions of SCEAPI including authentication, file transfer and job management for creating, submitting and monitoring, and how to use SCEAPI in an easy-to-use way. Finally, we discuss how to exploit more HPC resources quickly for the ATLAS experiment by implementing the custom ARC compute element based on SCEAPI, and our work shows that SCEAPI is an easy-to-use and effective solution to extend opportunistic HPC resources.
Complex optimization for big computational and experimental neutron datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bao, Feng; Oak Ridge National Lab.; Archibald, Richard
Here, we present a framework to use high performance computing to determine accurate solutions to the inverse optimization problem of big experimental data against computational models. We demonstrate how image processing, mathematical regularization, and hierarchical modeling can be used to solve complex optimization problems on big data. We also demonstrate how both model and data information can be used to further increase solution accuracy of optimization by providing confidence regions for the processing and regularization algorithms. Finally, we use the framework in conjunction with the software package SIMPHONIES to analyze results from neutron scattering experiments on silicon single crystals, andmore » refine first principles calculations to better describe the experimental data.« less
Complex optimization for big computational and experimental neutron datasets
Bao, Feng; Oak Ridge National Lab.; Archibald, Richard; ...
2016-11-07
Here, we present a framework to use high performance computing to determine accurate solutions to the inverse optimization problem of big experimental data against computational models. We demonstrate how image processing, mathematical regularization, and hierarchical modeling can be used to solve complex optimization problems on big data. We also demonstrate how both model and data information can be used to further increase solution accuracy of optimization by providing confidence regions for the processing and regularization algorithms. Finally, we use the framework in conjunction with the software package SIMPHONIES to analyze results from neutron scattering experiments on silicon single crystals, andmore » refine first principles calculations to better describe the experimental data.« less
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi
As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Radenski, Atanas
2003-01-01
The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M
2016-01-01
The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
High performance computing environment for multidimensional image analysis
Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo
2007-01-01
Background The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. Results We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478× speedup. Conclusion Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets. PMID:17634099
High performance computing environment for multidimensional image analysis.
Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo
2007-07-10
The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.
Automating slope monitoring in mines with terrestrial lidar scanners
NASA Astrophysics Data System (ADS)
Conforti, Dario
2014-05-01
Static terrestrial laser scanners (TLS) have been an important component of slope monitoring for some time, and many solutions for monitoring the progress of a slide have been devised over the years. However, all of these solutions have required users to operate the lidar equipment in the field, creating a high cost in time and resources, especially if the surveys must be performed very frequently. This paper presents a new solution for monitoring slides, developed using a TLS and an automated data acquisition, processing and analysis system. In this solution, a TLS is permanently mounted within sight of the target surface and connected to a control computer. The control software on the computer automatically triggers surveys according to a user-defined schedule, parses data into point clouds, and compares data against a baseline. The software can base the comparison against either the original survey of the site or the most recent survey, depending on whether the operator needs to measure the total or recent movement of the slide. If the displacement exceeds a user-defined safety threshold, the control computer transmits alerts via SMS text messaging and/or email, including graphs and tables describing the nature and size of the displacement. The solution can also be configured to trigger the external visual/audio alarm systems. If the survey areas contain high-traffic areas such as roads, the operator can mark them for exclusion in the comparison to prevent false alarms. To improve usability and safety, the control computer can connect to a local intranet and allow remote access through the software's web portal. This enables operators to perform most tasks with the TLS from their office, including reviewing displacement reports, downloading survey data, and adjusting the scan schedule. This solution has proved invaluable in automatically detecting and alerting users to potential danger within the monitored areas while lowering the cost and work required for monitoring. An explanation of the entire system and a post-acquisition data demonstration will be presented.
Avoiding Obstructions in Aiming a High-Gain Antenna
NASA Technical Reports Server (NTRS)
Edmonds, Karina
2006-01-01
The High Gain Antenna Pointing and Obstruction Avoidance software performs computations for pointing a Mars Rover high-gain antenna for communication with Earth while (1) avoiding line-of-sight obstructions (the Martian terrain and other parts of the Rover) that would block communication and (2) taking account of limits in ranges of motion of antenna gimbals and of kinematic singularities in gimbal mechanisms. The software uses simplified geometric models of obstructions and of the trajectory of the Earth in the Martian sky(see figure). It treats all obstructions according to a generalized approach, computing and continually updating the time remaining before interception of each obstruction. In cases in which the gimbal-mechanism design allows two aiming solutions, the algorithm chooses the solution that provides the longest obstruction-free Earth-tracking time. If the communication session continues until an obstruction is encountered in the current pointing solution and the other solution is now unobstructed, then the algorithm automatically switches to the other position. This software also notifies communication- managing software to cease transmission during the switch to the unobstructed position, resuming it when the switch is complete.
NASA Astrophysics Data System (ADS)
Shaat, Musbah; Bader, Faouzi
2010-12-01
Cognitive Radio (CR) systems have been proposed to increase the spectrum utilization by opportunistically access the unused spectrum. Multicarrier communication systems are promising candidates for CR systems. Due to its high spectral efficiency, filter bank multicarrier (FBMC) can be considered as an alternative to conventional orthogonal frequency division multiplexing (OFDM) for transmission over the CR networks. This paper addresses the problem of resource allocation in multicarrier-based CR networks. The objective is to maximize the downlink capacity of the network under both total power and interference introduced to the primary users (PUs) constraints. The optimal solution has high computational complexity which makes it unsuitable for practical applications and hence a low complexity suboptimal solution is proposed. The proposed algorithm utilizes the spectrum holes in PUs bands as well as active PU bands. The performance of the proposed algorithm is investigated for OFDM and FBMC based CR systems. Simulation results illustrate that the proposed resource allocation algorithm with low computational complexity achieves near optimal performance and proves the efficiency of using FBMC in CR context.
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Radenski, Atanas; Follen, Gregory J. (Technical Monitor)
2001-01-01
The rapid growth of internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of new, internet-oriented software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high -performance computing applications community. The general goal of this research project is to contribute to better understanding of the transition to internet-based high -performance computing and to develop solutions for some of the difficulties of this transition. More specifically, our goal is to design an architecture for generic divide and conquer internet-based computing, to develop a portable implementation of this architecture, to create an example library of high-performance divide-and-conquer computing agents that run on top of this architecture, and to evaluate the performance of these agents. We have been designing an architecture that incorporates a master task-pool server and utilizes satellite computational servers that operate on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. Our designed architecture is intended to be complementary to and accessible from computational grids such as Globus, Legion, and Condor. Grids provide remote access to existing high-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end internet nodes. Our project is focused on a generic divide-and-conquer paradigm and its applications that operate on a loose and ever changing pool of lower-end internet nodes.
Irregular Applications: Architectures & Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, John T.; Villa, Oreste; Tumeo, Antonino
Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.
Thermodynamics of Hydrophobic Amino Acids in Solution: A Combined Experimental–Computational Study
Song, Lingshuang; Yang, Lin; Meng, Jie; ...
2016-12-29
Here, we present a joint experimental-computational study to quantitatively describe the thermodynamics of hydrophobic leucine amino acids in aqueous solution. X-ray scattering data were acquired at a series of solute and salt concentrations to effectively measure inter-leucine interactions, indicating that a major scattering peak is observed consistently at q = 0.83 Å -1. Atomistic molecular dynamics simulations were then performed and compared with the scattering data, achieving high consistency at both small and wider scattering angles (q = 0$-$1.5 Å -1). This experimental-computational consistence enables a first glimpse of the leucineleucine interacting landscape, where two leucine molecules are aligned mostlymore » in a parallel fashion, as opposed to anti-parallel, but also allows us to derive effective leucine-leucine interactions in solution. Collectively, this combined approach of employing experimental scattering and molecular simulation enables a quantitative characterization on effective inter-molecular interactions of hydrophobic amino acids, critical for protein function and dynamics such as protein folding.« less
Thermodynamics of Hydrophobic Amino Acids in Solution: A Combined Experimental–Computational Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Lingshuang; Yang, Lin; Meng, Jie
Here, we present a joint experimental-computational study to quantitatively describe the thermodynamics of hydrophobic leucine amino acids in aqueous solution. X-ray scattering data were acquired at a series of solute and salt concentrations to effectively measure inter-leucine interactions, indicating that a major scattering peak is observed consistently at q = 0.83 Å -1. Atomistic molecular dynamics simulations were then performed and compared with the scattering data, achieving high consistency at both small and wider scattering angles (q = 0$-$1.5 Å -1). This experimental-computational consistence enables a first glimpse of the leucineleucine interacting landscape, where two leucine molecules are aligned mostlymore » in a parallel fashion, as opposed to anti-parallel, but also allows us to derive effective leucine-leucine interactions in solution. Collectively, this combined approach of employing experimental scattering and molecular simulation enables a quantitative characterization on effective inter-molecular interactions of hydrophobic amino acids, critical for protein function and dynamics such as protein folding.« less
Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu
2017-11-01
This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S 3 ONC admits FPTAS.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.
1990-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.
1992-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Re-Computation of Numerical Results Contained in NACA Report No. 496
NASA Technical Reports Server (NTRS)
Perry, Boyd, III
2015-01-01
An extensive examination of NACA Report No. 496 (NACA 496), "General Theory of Aerodynamic Instability and the Mechanism of Flutter," by Theodore Theodorsen, is described. The examination included checking equations and solution methods and re-computing interim quantities and all numerical examples in NACA 496. The checks revealed that NACA 496 contains computational shortcuts (time- and effort-saving devices for engineers of the time) and clever artifices (employed in its solution methods), but, unfortunately, also contains numerous tripping points (aspects of NACA 496 that have the potential to cause confusion) and some errors. The re-computations were performed employing the methods and procedures described in NACA 496, but using modern computational tools. With some exceptions, the magnitudes and trends of the original results were in fair-to-very-good agreement with the re-computed results. The exceptions included what are speculated to be computational errors in the original in some instances and transcription errors in the original in others. Independent flutter calculations were performed and, in all cases, including those where the original and re-computed results differed significantly, were in excellent agreement with the re-computed results. Appendix A contains NACA 496; Appendix B contains a Matlab(Reistered) program that performs the re-computation of results; Appendix C presents three alternate solution methods, with examples, for the two-degree-of-freedom solution method of NACA 496; Appendix D contains the three-degree-of-freedom solution method (outlined in NACA 496 but never implemented), with examples.
Parallel, adaptive finite element methods for conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.
1994-01-01
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Integrated Computational Solution for Predicting Skin Sensitization Potential of Molecules
Desai, Aarti; Singh, Vivek K.; Jere, Abhay
2016-01-01
Introduction Skin sensitization forms a major toxicological endpoint for dermatology and cosmetic products. Recent ban on animal testing for cosmetics demands for alternative methods. We developed an integrated computational solution (SkinSense) that offers a robust solution and addresses the limitations of existing computational tools i.e. high false positive rate and/or limited coverage. Results The key components of our solution include: QSAR models selected from a combinatorial set, similarity information and literature-derived sub-structure patterns of known skin protein reactive groups. Its prediction performance on a challenge set of molecules showed accuracy = 75.32%, CCR = 74.36%, sensitivity = 70.00% and specificity = 78.72%, which is better than several existing tools including VEGA (accuracy = 45.00% and CCR = 54.17% with ‘High’ reliability scoring), DEREK (accuracy = 72.73% and CCR = 71.44%) and TOPKAT (accuracy = 60.00% and CCR = 61.67%). Although, TIMES-SS showed higher predictive power (accuracy = 90.00% and CCR = 92.86%), the coverage was very low (only 10 out of 77 molecules were predicted reliably). Conclusions Owing to improved prediction performance and coverage, our solution can serve as a useful expert system towards Integrated Approaches to Testing and Assessment for skin sensitization. It would be invaluable to cosmetic/ dermatology industry for pre-screening their molecules, and reducing time, cost and animal testing. PMID:27271321
Mass storage: The key to success in high performance computing
NASA Technical Reports Server (NTRS)
Lee, Richard R.
1993-01-01
There are numerous High Performance Computing & Communications Initiatives in the world today. All are determined to help solve some 'Grand Challenges' type of problem, but each appears to be dominated by the pursuit of higher and higher levels of CPU performance and interconnection bandwidth as the approach to success, without any regard to the impact of Mass Storage. My colleagues and I at Data Storage Technologies believe that all will have their performance against their goals ultimately measured by their ability to efficiently store and retrieve the 'deluge of data' created by end-users who will be using these systems to solve Scientific Grand Challenges problems, and that the issue of Mass Storage will become then the determinant of success or failure in achieving each projects goals. In today's world of High Performance Computing and Communications (HPCC), the critical path to success in solving problems can only be traveled by designing and implementing Mass Storage Systems capable of storing and manipulating the truly 'massive' amounts of data associated with solving these challenges. Within my presentation I will explore this critical issue and hypothesize solutions to this problem.
Algorithm and Architecture Independent Benchmarking with SEAK
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tallent, Nathan R.; Manzano Franco, Joseph B.; Gawande, Nitin A.
2016-05-23
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed the Suite for Embedded Applications & Kernels (SEAK), a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user black-box evaluation that can capture tradeoffs between performance, power, accuracy, size, andmore » weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
The Suite for Embedded Applications and Kernels
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-05-10
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We havedesigned SEAK, a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions to these bottlenecks? and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) andgoal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user blackbox evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informativemore » for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
SoftWAXS: a computational tool for modeling wide-angle X-ray solution scattering from biomolecules.
Bardhan, Jaydeep; Park, Sanghyun; Makowski, Lee
2009-10-01
This paper describes a computational approach to estimating wide-angle X-ray solution scattering (WAXS) from proteins, which has been implemented in a computer program called SoftWAXS. The accuracy and efficiency of SoftWAXS are analyzed for analytically solvable model problems as well as for proteins. Key features of the approach include a numerical procedure for performing the required spherical averaging and explicit representation of the solute-solvent boundary and the surface of the hydration layer. These features allow the Fourier transform of the excluded volume and hydration layer to be computed directly and with high accuracy. This approach will allow future investigation of different treatments of the electron density in the hydration shell. Numerical results illustrate the differences between this approach to modeling the excluded volume and a widely used model that treats the excluded-volume function as a sum of Gaussians representing the individual atomic excluded volumes. Comparison of the results obtained here with those from explicit-solvent molecular dynamics clarifies shortcomings inherent to the representation of solvent as a time-averaged electron-density profile. In addition, an assessment is made of how the calculated scattering patterns depend on input parameters such as the solute-atom radii, the width of the hydration shell and the hydration-layer contrast. These results suggest that obtaining predictive calculations of high-resolution WAXS patterns may require sophisticated treatments of solvent.
Congestion game scheduling for virtual drug screening optimization
NASA Astrophysics Data System (ADS)
Nikitina, Natalia; Ivashko, Evgeny; Tchernykh, Andrei
2018-02-01
In virtual drug screening, the chemical diversity of hits is an important factor, along with their predicted activity. Moreover, interim results are of interest for directing the further research, and their diversity is also desirable. In this paper, we consider a problem of obtaining a diverse set of virtual screening hits in a short time. To this end, we propose a mathematical model of task scheduling for virtual drug screening in high-performance computational systems as a congestion game between computational nodes to find the equilibrium solutions for best balancing the number of interim hits with their chemical diversity. The model considers the heterogeneous environment with workload uncertainty, processing time uncertainty, and limited knowledge about the input dataset structure. We perform computational experiments and evaluate the performance of the developed approach considering organic molecules database GDB-9. The used set of molecules is rich enough to demonstrate the feasibility and practicability of proposed solutions. We compare the algorithm with two known heuristics used in practice and observe that game-based scheduling outperforms them by the hit discovery rate and chemical diversity at earlier steps. Based on these results, we use a social utility metric for assessing the efficiency of our equilibrium solutions and show that they reach greatest values.
Mobile Computing for Aerospace Applications
NASA Technical Reports Server (NTRS)
Alena, Richard; Swietek, Gregory E. (Technical Monitor)
1994-01-01
The use of commercial computer technology in specific aerospace mission applications can reduce the cost and project cycle time required for the development of special-purpose computer systems. Additionally, the pace of technological innovation in the commercial market has made new computer capabilities available for demonstrations and flight tests. Three areas of research and development being explored by the Portable Computer Technology Project at NASA Ames Research Center are the application of commercial client/server network computing solutions to crew support and payload operations, the analysis of requirements for portable computing devices, and testing of wireless data communication links as extensions to the wired network. This paper will present computer architectural solutions to portable workstation design including the use of standard interfaces, advanced flat-panel displays and network configurations incorporating both wired and wireless transmission media. It will describe the design tradeoffs used in selecting high-performance processors and memories, interfaces for communication and peripheral control, and high resolution displays. The packaging issues for safe and reliable operation aboard spacecraft and aircraft are presented. The current status of wireless data links for portable computers is discussed from a system design perspective. An end-to-end data flow model for payload science operations from the experiment flight rack to the principal investigator is analyzed using capabilities provided by the new generation of computer products. A future flight experiment on-board the Russian MIR space station will be described in detail including system configuration and function, the characteristics of the spacecraft operating environment, the flight qualification measures needed for safety review, and the specifications of the computing devices to be used in the experiment. The software architecture chosen shall be presented. An analysis of the performance characteristics of wireless data links in the spacecraft environment will be discussed. Network performance and operation will be modeled and preliminary test results presented. A crew support application will be demonstrated in conjunction with the network metrics experiment.
Climate Data Assimilation on a Massively Parallel Supercomputer
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.
Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton
2018-03-13
The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.
2017-12-01
The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
High performance computing and communications: Advancing the frontiers of information technology
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-12-31
This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental inmore » the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.« less
Computations of ideal and real gas high altitude plume flows
NASA Technical Reports Server (NTRS)
Feiereisen, William J.; Venkatapathy, Ethiraj
1988-01-01
In the present work, complete flow fields around generic space vehicles in supersonic and hypersonic flight regimes are studied numerically. Numerical simulation is performed with a flux-split, time asymptotic viscous flow solver that incorporates a generalized equilibrium chemistry model. Solutions to generic problems at various altitude and flight conditions show the complexity of the flow, the equilibrium chemical dissociation and its effect on the overall flow field. Viscous ideal gas solutions are compared against equilibrium gas solutions to illustrate the effect of equilibrium chemistry. Improved solution accuracy is achieved through adaptive grid refinement.
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
Fuzzy logic, neural networks, and soft computing
NASA Technical Reports Server (NTRS)
Zadeh, Lofti A.
1994-01-01
The past few years have witnessed a rapid growth of interest in a cluster of modes of modeling and computation which may be described collectively as soft computing. The distinguishing characteristic of soft computing is that its primary aims are to achieve tractability, robustness, low cost, and high MIQ (machine intelligence quotient) through an exploitation of the tolerance for imprecision and uncertainty. Thus, in soft computing what is usually sought is an approximate solution to a precisely formulated problem or, more typically, an approximate solution to an imprecisely formulated problem. A simple case in point is the problem of parking a car. Generally, humans can park a car rather easily because the final position of the car is not specified exactly. If it were specified to within, say, a few millimeters and a fraction of a degree, it would take hours or days of maneuvering and precise measurements of distance and angular position to solve the problem. What this simple example points to is the fact that, in general, high precision carries a high cost. The challenge, then, is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. By its nature, soft computing is much closer to human reasoning than the traditional modes of computation. At this juncture, the major components of soft computing are fuzzy logic (FL), neural network theory (NN), and probabilistic reasoning techniques (PR), including genetic algorithms, chaos theory, and part of learning theory. Increasingly, these techniques are used in combination to achieve significant improvement in performance and adaptability. Among the important application areas for soft computing are control systems, expert systems, data compression techniques, image processing, and decision support systems. It may be argued that it is soft computing, rather than the traditional hard computing, that should be viewed as the foundation for artificial intelligence. In the years ahead, this may well become a widely held position.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-10-20
Look-ahead dynamic simulation software system incorporates the high performance parallel computing technologies, significantly reduces the solution time for each transient simulation case, and brings the dynamic simulation analysis into on-line applications to enable more transparency for better reliability and asset utilization. It takes the snapshot of the current power grid status, functions in parallel computing the system dynamic simulation, and outputs the transient response of the power system in real time.
Overdetermined shooting methods for computing standing water waves with spectral accuracy
NASA Astrophysics Data System (ADS)
Wilkening, Jon; Yu, Jia
2012-01-01
A high-performance shooting algorithm is developed to compute time-periodic solutions of the free-surface Euler equations with spectral accuracy in double and quadruple precision. The method is used to study resonance and its effect on standing water waves. We identify new nucleation mechanisms in which isolated large-amplitude solutions, and closed loops of such solutions, suddenly exist for depths below a critical threshold. We also study degenerate and secondary bifurcations related to Wilton's ripples in the traveling case, and explore the breakdown of self-similarity at the crests of extreme standing waves. In shallow water, we find that standing waves take the form of counter-propagating solitary waves that repeatedly collide quasi-elastically. In deep water with surface tension, we find that standing waves resemble counter-propagating depression waves. We also discuss the existence and non-uniqueness of solutions, and smooth versus erratic dependence of Fourier modes on wave amplitude and fluid depth. In the numerical method, robustness is achieved by posing the problem as an overdetermined nonlinear system and using either adjoint-based minimization techniques or a quadratically convergent trust-region method to minimize the objective function. Efficiency is achieved in the trust-region approach by parallelizing the Jacobian computation, so the setup cost of computing the Dirichlet-to-Neumann operator in the variational equation is not repeated for each column. Updates of the Jacobian are also delayed until the previous Jacobian ceases to be useful. Accuracy is maintained using spectral collocation with optional mesh refinement in space, a high-order Runge-Kutta or spectral deferred correction method in time and quadruple precision for improved navigation of delicate regions of parameter space as well as validation of double-precision results. Implementation issues for transferring much of the computation to a graphic processing units are briefly discussed, and the performance of the algorithm is tested for a number of hardware configurations.
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.
Kim, Lok-Won
2018-05-01
Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).
Evolution of cellular automata with memory: The Density Classification Task.
Stone, Christopher; Bull, Larry
2009-08-01
The Density Classification Task is a well known test problem for two-state discrete dynamical systems. For many years researchers have used a variety of evolutionary computation approaches to evolve solutions to this problem. In this paper, we investigate the evolvability of solutions when the underlying Cellular Automaton is augmented with a type of memory based on the Least Mean Square algorithm. To obtain high performance solutions using a simple non-hybrid genetic algorithm, we design a novel representation based on the ternary representation used for Learning Classifier Systems. The new representation is found able to produce superior performance to the bit string traditionally used for representing Cellular automata. Moreover, memory is shown to improve evolvability of solutions and appropriate memory settings are able to be evolved as a component part of these solutions.
Progress towards daily "swath" solutions from GRACE
NASA Astrophysics Data System (ADS)
Save, H.; Bettadpur, S. V.; Sakumura, C.
2015-12-01
The GRACE mission has provided invaluable and the only data of its kind that measures the total water column in the Earth System over the past 13 years. The GRACE solutions available from the project have been monthly average solutions. There have been attempts by several groups to produce shorter time-window solutions with different techniques. There is also an experimental quick-look GRACE solution available from CSR that implements a sliding window approach while applying variable daily data weights. All of these GRACE solutions require special handling for data assimilation. This study explores the possibility of generating a true daily GRACE solution by computing a daily "swath" total water storage (TWS) estimate from GRACE using the Tikhonov regularization and high resolution monthly mascon estimation implemented at CSR. This paper discusses the techniques for computing such a solution and discusses the error and uncertainty characterization. We perform comparisons with official RL05 GRACE solutions and with alternate mascon solutions from CSR to understand the impact on the science results. We evaluate these solutions with emphasis on the temporal characteristics of the signal content and validate them against multiple models and in-situ data sets.
NASA Astrophysics Data System (ADS)
Cai, Jiaxiang; Liang, Hua; Zhang, Chun
2018-06-01
Based on the multi-symplectic Hamiltonian formula of the generalized Rosenau-type equation, a multi-symplectic scheme and an energy-preserving scheme are proposed. To improve the accuracy of the solution, we apply the composition technique to the obtained schemes to develop high-order schemes which are also multi-symplectic and energy-preserving respectively. Discrete fast Fourier transform makes a significant improvement to the computational efficiency of schemes. Numerical results verify that all the proposed schemes have satisfactory performance in providing accurate solution and preserving the discrete mass and energy invariants. Numerical results also show that although each basic time step is divided into several composition steps, the computational efficiency of the composition schemes is much higher than that of the non-composite schemes.
Verification of low-Mach number combustion codes using the method of manufactured solutions
NASA Astrophysics Data System (ADS)
Shunn, Lee; Ham, Frank; Knupp, Patrick; Moin, Parviz
2007-11-01
Many computational combustion models rely on tabulated constitutive relations to close the system of equations. As these reactive state-equations are typically multi-dimensional and highly non-linear, their implications on the convergence and accuracy of simulation codes are not well understood. In this presentation, the effects of tabulated state-relationships on the computational performance of low-Mach number combustion codes are explored using the method of manufactured solutions (MMS). Several MMS examples are developed and applied, progressing from simple one-dimensional configurations to problems involving higher dimensionality and solution-complexity. The manufactured solutions are implemented in two multi-physics hydrodynamics codes: CDP developed at Stanford University and FUEGO developed at Sandia National Laboratories. In addition to verifying the order-of-accuracy of the codes, the MMS problems help highlight certain robustness issues in existing variable-density flow-solvers. Strategies to overcome these issues are briefly discussed.
Computational complexities and storage requirements of some Riccati equation solvers
NASA Technical Reports Server (NTRS)
Utku, Senol; Garba, John A.; Ramesh, A. V.
1989-01-01
The linear optimal control problem of an nth-order time-invariant dynamic system with a quadratic performance functional is usually solved by the Hamilton-Jacobi approach. This leads to the solution of the differential matrix Riccati equation with a terminal condition. The bulk of the computation for the optimal control problem is related to the solution of this equation. There are various algorithms in the literature for solving the matrix Riccati equation. However, computational complexities and storage requirements as a function of numbers of state variables, control variables, and sensors are not available for all these algorithms. In this work, the computational complexities and storage requirements for some of these algorithms are given. These expressions show the immensity of the computational requirements of the algorithms in solving the Riccati equation for large-order systems such as the control of highly flexible space structures. The expressions are also needed to compute the speedup and efficiency of any implementation of these algorithms on concurrent machines.
Total variation-based neutron computed tomography
NASA Astrophysics Data System (ADS)
Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick
2018-05-01
We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.
Benchmark solution of the dynamic response of a spherical shell at finite strain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Versino, Daniele; Brock, Jerry S.
2016-09-28
Our paper describes the development of high fidelity solutions for the study of homogeneous (elastic and inelastic) spherical shells subject to dynamic loading and undergoing finite deformations. The goal of the activity is to provide high accuracy results that can be used as benchmark solutions for the verification of computational physics codes. Furthermore, the equilibrium equations for the geometrically non-linear problem are solved through mode expansion of the displacement field and the boundary conditions are enforced in a strong form. Time integration is performed through high-order implicit Runge–Kutta schemes. Finally, we evaluate accuracy and convergence of the proposed method bymore » means of numerical examples with finite deformations and material non-linearities and inelasticity.« less
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
A Benders based rolling horizon algorithm for a dynamic facility location problem
Marufuzzaman,, Mohammad; Gedik, Ridvan; Roni, Mohammad S.
2016-06-28
This study presents a well-known capacitated dynamic facility location problem (DFLP) that satisfies the customer demand at a minimum cost by determining the time period for opening, closing, or retaining an existing facility in a given location. To solve this challenging NP-hard problem, this paper develops a unique hybrid solution algorithm that combines a rolling horizon algorithm with an accelerated Benders decomposition algorithm. Extensive computational experiments are performed on benchmark test instances to evaluate the hybrid algorithm’s efficiency and robustness in solving the DFLP problem. Computational results indicate that the hybrid Benders based rolling horizon algorithm consistently offers high qualitymore » feasible solutions in a much shorter computational time period than the standalone rolling horizon and accelerated Benders decomposition algorithms in the experimental range.« less
Radio Synthesis Imaging - A High Performance Computing and Communications Project
NASA Astrophysics Data System (ADS)
Crutcher, Richard M.
The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Computer-Aided Design and Optimization of High-Performance Vacuum Electronic Devices
2006-08-15
approximations to the metric, and space mapping wherein low-accuracy (coarse mesh) solutions can potentially be used more effectively in an...interface and algorithm development. • Work on space - mapping or related methods for utilizing models of varying levels of approximation within an
Chaudhry, Jehanzeb Hameed; Estep, Don; Tavener, Simon; Carey, Varis; Sandelin, Jeff
2016-01-01
We consider numerical methods for initial value problems that employ a two stage approach consisting of solution on a relatively coarse discretization followed by solution on a relatively fine discretization. Examples include adaptive error control, parallel-in-time solution schemes, and efficient solution of adjoint problems for computing a posteriori error estimates. We describe a general formulation of two stage computations then perform a general a posteriori error analysis based on computable residuals and solution of an adjoint problem. The analysis accommodates various variations in the two stage computation and in formulation of the adjoint problems. We apply the analysis to compute "dual-weighted" a posteriori error estimates, to develop novel algorithms for efficient solution that take into account cancellation of error, and to the Parareal Algorithm. We test the various results using several numerical examples.
2015-06-24
physically . While not distinct from IH models, they require inner boundary magnetic field and plasma property values, the latter not currently measured...initialization for the computational grid. Model integration continues until a physically consistent steady-state is attained. Because of the more... physical basis and greater likelihood of realistic solutions, only MHD-type coronal models were considered in the review. There are two major types of
Rapid indirect trajectory optimization on highly parallel computing architectures
NASA Astrophysics Data System (ADS)
Antony, Thomas
Trajectory optimization is a field which can benefit greatly from the advantages offered by parallel computing. The current state-of-the-art in trajectory optimization focuses on the use of direct optimization methods, such as the pseudo-spectral method. These methods are favored due to their ease of implementation and large convergence regions while indirect methods have largely been ignored in the literature in the past decade except for specific applications in astrodynamics. It has been shown that the shortcomings conventionally associated with indirect methods can be overcome by the use of a continuation method in which complex trajectory solutions are obtained by solving a sequence of progressively difficult optimization problems. High performance computing hardware is trending towards more parallel architectures as opposed to powerful single-core processors. Graphics Processing Units (GPU), which were originally developed for 3D graphics rendering have gained popularity in the past decade as high-performance, programmable parallel processors. The Compute Unified Device Architecture (CUDA) framework, a parallel computing architecture and programming model developed by NVIDIA, is one of the most widely used platforms in GPU computing. GPUs have been applied to a wide range of fields that require the solution of complex, computationally demanding problems. A GPU-accelerated indirect trajectory optimization methodology which uses the multiple shooting method and continuation is developed using the CUDA platform. The various algorithmic optimizations used to exploit the parallelism inherent in the indirect shooting method are described. The resulting rapid optimal control framework enables the construction of high quality optimal trajectories that satisfy problem-specific constraints and fully satisfy the necessary conditions of optimality. The benefits of the framework are highlighted by construction of maximum terminal velocity trajectories for a hypothetical long range weapon system. The techniques used to construct an initial guess from an analytic near-ballistic trajectory and the methods used to formulate the necessary conditions of optimality in a manner that is transparent to the designer are discussed. Various hypothetical mission scenarios that enforce different combinations of initial, terminal, interior point and path constraints demonstrate the rapid construction of complex trajectories without requiring any a-priori insight into the structure of the solutions. Trajectory problems of this kind were previously considered impractical to solve using indirect methods. The performance of the GPU-accelerated solver is found to be 2x--4x faster than MATLAB's bvp4c, even while running on GPU hardware that is five years behind the state-of-the-art.
Linear solver performance in elastoplastic problem solution on GPU cluster
NASA Astrophysics Data System (ADS)
Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.
2017-12-01
Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Tools for 3D scientific visualization in computational aerodynamics at NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
Hardware, software, and techniques used by the Fluid Dynamics Division (NASA) for performing visualization of computational aerodynamics, which can be applied to the visualization of flow fields from computer simulations of fluid dynamics about the Space Shuttle, are discussed. Three visualization techniques applied, post-processing, tracking, and steering, are described, as well as the post-processing software packages used, PLOT3D, SURF (Surface Modeller), GAS (Graphical Animation System), and FAST (Flow Analysis software Toolkit). Using post-processing methods a flow simulation was executed on a supercomputer and, after the simulation was complete, the results were processed for viewing. It is shown that the high-resolution, high-performance three-dimensional workstation combined with specially developed display and animation software provides a good tool for analyzing flow field solutions obtained from supercomputers.
NASA Technical Reports Server (NTRS)
Schutz, Bob E.; Baker, Gregory A.
1997-01-01
The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
NASA Astrophysics Data System (ADS)
Baker, Gregory Allen
The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
NASA Astrophysics Data System (ADS)
Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.
2012-12-01
Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
Integrating photonics with silicon nanoelectronics for the next generation of systems on a chip.
Atabaki, Amir H; Moazeni, Sajjad; Pavanello, Fabio; Gevorgyan, Hayk; Notaros, Jelena; Alloatti, Luca; Wade, Mark T; Sun, Chen; Kruger, Seth A; Meng, Huaiyu; Al Qubaisi, Kenaish; Wang, Imbert; Zhang, Bohan; Khilo, Anatol; Baiocco, Christopher V; Popović, Miloš A; Stojanović, Vladimir M; Ram, Rajeev J
2018-04-01
Electronic and photonic technologies have transformed our lives-from computing and mobile devices, to information technology and the internet. Our future demands in these fields require innovation in each technology separately, but also depend on our ability to harness their complementary physics through integrated solutions 1,2 . This goal is hindered by the fact that most silicon nanotechnologies-which enable our processors, computer memory, communications chips and image sensors-rely on bulk silicon substrates, a cost-effective solution with an abundant supply chain, but with substantial limitations for the integration of photonic functions. Here we introduce photonics into bulk silicon complementary metal-oxide-semiconductor (CMOS) chips using a layer of polycrystalline silicon deposited on silicon oxide (glass) islands fabricated alongside transistors. We use this single deposited layer to realize optical waveguides and resonators, high-speed optical modulators and sensitive avalanche photodetectors. We integrated this photonic platform with a 65-nanometre-transistor bulk CMOS process technology inside a 300-millimetre-diameter-wafer microelectronics foundry. We then implemented integrated high-speed optical transceivers in this platform that operate at ten gigabits per second, composed of millions of transistors, and arrayed on a single optical bus for wavelength division multiplexing, to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing 3,4 . By decoupling the formation of photonic devices from that of transistors, this integration approach can achieve many of the goals of multi-chip solutions 5 , but with the performance, complexity and scalability of 'systems on a chip' 1,6-8 . As transistors smaller than ten nanometres across become commercially available 9 , and as new nanotechnologies emerge 10,11 , this approach could provide a way to integrate photonics with state-of-the-art nanoelectronics.
Impact of topographic mask models on scanner matching solutions
NASA Astrophysics Data System (ADS)
Tyminski, Jacek K.; Pomplun, Jan; Renwick, Stephen P.
2014-03-01
Of keen interest to the IC industry are advanced computational lithography applications such as Optical Proximity Correction of IC layouts (OPC), scanner matching by optical proximity effect matching (OPEM), and Source Optimization (SO) and Source-Mask Optimization (SMO) used as advanced reticle enhancement techniques. The success of these tasks is strongly dependent on the integrity of the lithographic simulators used in computational lithography (CL) optimizers. Lithographic mask models used by these simulators are key drivers impacting the accuracy of the image predications, and as a consequence, determine the validity of these CL solutions. Much of the CL work involves Kirchhoff mask models, a.k.a. thin masks approximation, simplifying the treatment of the mask near-field images. On the other hand, imaging models for hyper-NA scanner require that the interactions of the illumination fields with the mask topography be rigorously accounted for, by numerically solving Maxwell's Equations. The simulators used to predict the image formation in the hyper-NA scanners must rigorously treat the masks topography and its interaction with the scanner illuminators. Such imaging models come at a high computational cost and pose challenging accuracy vs. compute time tradeoffs. Additional complication comes from the fact that the performance metrics used in computational lithography tasks show highly non-linear response to the optimization parameters. Finally, the number of patterns used for tasks such as OPC, OPEM, SO, or SMO range from tens to hundreds. These requirements determine the complexity and the workload of the lithography optimization tasks. The tools to build rigorous imaging optimizers based on first-principles governing imaging in scanners are available, but the quantifiable benefits they might provide are not very well understood. To quantify the performance of OPE matching solutions, we have compared the results of various imaging optimization trials obtained with Kirchhoff mask models to those obtained with rigorous models involving solutions of Maxwell's Equations. In both sets of trials, we used sets of large numbers of patterns, with specifications representative of CL tasks commonly encountered in hyper-NA imaging. In this report we present OPEM solutions based on various mask models and discuss the models' impact on hyper- NA scanner matching accuracy. We draw conclusions on the accuracy of results obtained with thin mask models vs. the topographic OPEM solutions. We present various examples representative of the scanner image matching for patterns representative of the current generation of IC designs.
FY07 NRL DoD High Performance Computing Modernization Program Annual Reports
2008-09-05
performed. Implicit and explicit solutions methods are used as appropriate. The primary finite element codes used are ABAQUS and ANSYS. User subroutines ...geometric complexities, loading path dependence, rate dependence, and interaction between loading types (electrical, thermal and mechanical). Work is not...are used for specialized material constitutive response. Coupled material responses, such as electrical- thermal for capacitor materials or electrical
Remote access to very large image repositories, a high performance computing perspective
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2005-01-01
The main challenges of using the increasingly large repositories of remote imagery data can be summarized in one word: efficiency. In this paper, a number of concrete problems and the chosen solutions are described, based on the construction of a 5TB global Landsat 7 mosaic.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
Field programmable gate array-assigned complex-valued computation and its limits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernard-Schwarz, Maria, E-mail: maria.bernardschwarz@ni.com; Institute of Applied Physics, TU Wien, Wiedner Hauptstrasse 8, 1040 Wien; Zwick, Wolfgang
We discuss how leveraging Field Programmable Gate Array (FPGA) technology as part of a high performance computing platform reduces latency to meet the demanding real time constraints of a quantum optics simulation. Implementations of complex-valued operations using fixed point numeric on a Virtex-5 FPGA compare favorably to more conventional solutions on a central processing unit. Our investigation explores the performance of multiple fixed point options along with a traditional 64 bits floating point version. With this information, the lowest execution times can be estimated. Relative error is examined to ensure simulation accuracy is maintained.
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Kim, Jeong Chul; Cruz, Dinna; Garzotto, Francesco; Kaushik, Manish; Teixeria, Catarina; Baldwin, Marie; Baldwin, Ian; Nalesso, Federico; Kim, Ji Hyun; Kang, Eungtaek; Kim, Hee Chan; Ronco, Claudio
2013-01-01
Continuous renal replacement therapy (CRRT) is commonly used for critically ill patients with acute kidney injury. During treatment, a slow dialysate flow rate can be applied to enhance diffusive solute removal. However, due to the lack of the rationale of the dialysate flow configuration (countercurrent or concurrent to blood flow), in clinical practice, the connection settings of a hemodiafilter are done depending on nurse preference or at random. In this study, we investigated the effects of flow configurations in a hemodiafilter during continuous venovenous hemodialysis on solute removal and fluid transport using computational fluid dynamic modeling. We solved the momentum equation coupling solute transport to predict quantitative diffusion and convection phenomena in a simplified hemodiafilter model. Computational modeling results showed superior solute removal (clearance of urea: 67.8 vs. 45.1 ml/min) and convection (filtration volume: 29.0 vs. 25.7 ml/min) performances for the countercurrent flow configuration. Countercurrent flow configuration enhances convection and diffusion compared to concurrent flow configuration by increasing filtration volume and equilibrium concentration in the proximal part of a hemodiafilter and backfiltration of pure dialysate in the distal part. In clinical practice, the countercurrent dialysate flow configuration of a hemodiafilter could increase solute removal in CRRT. Nevertheless, while this configuration may become mandatory for high-efficiency treatments, the impact of differences in solute removal observed in slow continuous therapies may be less important. Under these circumstances, if continuous therapies are prescribed, some of the advantages of the concurrent configuration in terms of simpler circuit layout and simpler machine design may overcome the advantages in terms of solute clearance. Different dialysate flow configurations influence solute clearance and change major solute removal mechanisms in the proximal and distal parts of a hemodiafilter. Advantages of each configuration should be balanced against the overall performance of the treatment and its simplicity in terms of treatment delivery and circuit handling procedures. Copyright © 2013 S. Karger AG, Basel.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Open solutions to distributed control in ground tracking stations
NASA Technical Reports Server (NTRS)
Heuser, William Randy
1994-01-01
The advent of high speed local area networks has made it possible to interconnect small, powerful computers to function together as a single large computer. Today, distributed computer systems are the new paradigm for large scale computing systems. However, the communications provided by the local area network is only one part of the solution. The services and protocols used by the application programs to communicate across the network are as indispensable as the local area network. And the selection of services and protocols that do not match the system requirements will limit the capabilities, performance, and expansion of the system. Proprietary solutions are available but are usually limited to a select set of equipment. However, there are two solutions based on 'open' standards. The question that must be answered is 'which one is the best one for my job?' This paper examines a model for tracking stations and their requirements for interprocessor communications in the next century. The model and requirements are matched with the model and services provided by the five different software architectures and supporting protocol solutions. Several key services are examined in detail to determine which services and protocols most closely match the requirements for the tracking station environment. The study reveals that the protocols are tailored to the problem domains for which they were originally designed. Further, the study reveals that the process control model is the closest match to the tracking station model.
WATERLOPP V2/64: A highly parallel machine for numerical computation
NASA Astrophysics Data System (ADS)
Ostlund, Neil S.
1985-07-01
Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.
Guidelines for Computing Longitudinal Dynamic Stability Characteristics of a Subsonic Transport
NASA Technical Reports Server (NTRS)
Thompson, Joseph R.; Frank, Neal T.; Murphy, Patrick C.
2010-01-01
A systematic study is presented to guide the selection of a numerical solution strategy for URANS computation of a subsonic transport configuration undergoing simulated forced oscillation about its pitch axis. Forced oscillation is central to the prevalent wind tunnel methodology for quantifying aircraft dynamic stability derivatives from force and moment coefficients, which is the ultimate goal for the computational simulations. Extensive computations are performed that lead in key insights of the critical numerical parameters affecting solution convergence. A preliminary linear harmonic analysis is included to demonstrate the potential of extracting dynamic stability derivatives from computational solutions.
Modified artificial bee colony for the vehicle routing problems with time windows.
Alzaqebah, Malek; Abdullah, Salwani; Jawarneh, Sana
2016-01-01
The natural behaviour of the honeybee has attracted the attention of researchers in recent years and several algorithms have been developed that mimic swarm behaviour to solve optimisation problems. This paper introduces an artificial bee colony (ABC) algorithm for the vehicle routing problem with time windows (VRPTW). A Modified ABC algorithm is proposed to improve the solution quality of the original ABC. The high exploration ability of the ABC slows-down its convergence speed, which may due to the mechanism used by scout bees in replacing abandoned (unimproved) solutions with new ones. In the Modified ABC a list of abandoned solutions is used by the scout bees to memorise the abandoned solutions, then the scout bees select a solution from the list based on roulette wheel selection and replace by a new solution with random routs selected from the best solution. The performance of the Modified ABC is evaluated on Solomon benchmark datasets and compared with the original ABC. The computational results demonstrate that the Modified ABC outperforms the original ABC also produce good solutions when compared with the best-known results in the literature. Computational investigations show that the proposed algorithm is a good and promising approach for the VRPTW.
Feasibility of Virtual Machine and Cloud Computing Technologies for High Performance Computing
2014-05-01
Hat Enterprise Linux SaaS software as a service VM virtual machine vNUMA virtual non-uniform memory access WRF weather research and forecasting...previously mentioned in Chapter I Section B1 of this paper, which is used to run the weather research and forecasting ( WRF ) model in their experiments...against a VMware virtualization solution of WRF . The experiment consisted of running WRF in a standard configuration between the D-VTM and VMware while
Network issues for large mass storage requirements
NASA Technical Reports Server (NTRS)
Perdue, James
1992-01-01
File Servers and Supercomputing environments need high performance networks to balance the I/O requirements seen in today's demanding computing scenarios. UltraNet is one solution which permits both high aggregate transfer rates and high task-to-task transfer rates as demonstrated in actual tests. UltraNet provides this capability as both a Server-to-Server and Server-to-Client access network giving the supercomputing center the following advantages highest performance Transport Level connections (to 40 MBytes/sec effective rates); matches the throughput of the emerging high performance disk technologies, such as RAID, parallel head transfer devices and software striping; supports standard network and file system applications using SOCKET's based application program interface such as FTP, rcp, rdump, etc.; supports access to the Network File System (NFS) and LARGE aggregate bandwidth for large NFS usage; provides access to a distributed, hierarchical data server capability using DISCOS UniTree product; supports file server solutions available from multiple vendors, including Cray, Convex, Alliant, FPS, IBM, and others.
High-performance parallel analysis of coupled problems for aircraft propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Lanteri, S.; Gumaste, U.; Ronaghi, M.
1994-01-01
Applications are described of high-performance parallel, computation for the analysis of complete jet engines, considering its multi-discipline coupled problem. The coupled problem involves interaction of structures with gas dynamics, heat conduction and heat transfer in aircraft engines. The methodology issues addressed include: consistent discrete formulation of coupled problems with emphasis on coupling phenomena; effect of partitioning strategies, augmentation and temporal solution procedures; sensitivity of response to problem parameters; and methods for interfacing multiscale discretizations in different single fields. The computer implementation issues addressed include: parallel treatment of coupled systems; domain decomposition and mesh partitioning strategies; data representation in object-oriented form and mapping to hardware driven representation, and tradeoff studies between partitioning schemes and fully coupled treatment.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Numerical Simulation of a High-Lift Configuration with Embedded Fluidic Actuators
NASA Technical Reports Server (NTRS)
Vatsa, Veer N.; Casalino, Damiano; Lin, John C.; Appelbaum, Jason
2014-01-01
Numerical simulations have been performed for a vertical tail configuration with deflected rudder. The suction surface of the main element of this configuration is embedded with an array of 32 fluidic actuators that produce oscillating sweeping jets. Such oscillating jets have been found to be very effective for flow control applications in the past. In the current paper, a high-fidelity computational fluid dynamics (CFD) code known as the PowerFLOW(Registered TradeMark) code is used to simulate the entire flow field associated with this configuration, including the flow inside the actuators. The computed results for the surface pressure and integrated forces compare favorably with measured data. In addition, numerical solutions predict the correct trends in forces with active flow control compared to the no control case. Effect of varying yaw and rudder deflection angles are also presented. In addition, computations have been performed at a higher Reynolds number to assess the performance of fluidic actuators at flight conditions.
High-performance software-only H.261 video compression on PC
NASA Astrophysics Data System (ADS)
Kasperovich, Leonid
1996-03-01
This paper describes an implementation of a software H.261 codec for PC, that takes an advantage of the fast computational algorithms for DCT-based video compression, which have been presented by the author at the February's 1995 SPIE/IS&T meeting. The motivation for developing the H.261 prototype system is to demonstrate a feasibility of real time software- only videoconferencing solution to operate across a wide range of network bandwidth, frame rate, and resolution of the input video. As the bandwidths of current network technology will be increased, the higher frame rate and resolution of video to be transmitted is allowed, that requires, in turn, a software codec to be able to compress pictures of CIF (352 X 288) resolution at up to 30 frame/sec. Running on Pentium 133 MHz PC the codec presented is capable to compress video in CIF format at 21 - 23 frame/sec. This result is comparable to the known hardware-based H.261 solutions, but it doesn't require any specific hardware. The methods to achieve high performance, the program optimization technique for Pentium microprocessor along with the performance profile, showing the actual contribution of the different encoding/decoding stages to the overall computational process, are presented.
Galaxy CloudMan: delivering cloud compute clusters.
Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James
2010-12-21
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele
2014-11-11
has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
East, D. R.; Sexton, J.
This was a collaborative effort between Lawrence Livermore National Security, LLC as manager and operator of Lawrence Livermore National Laboratory (LLNL) and IBM TJ Watson Research Center to research, assess feasibility and develop an implementation plan for a High Performance Computing Innovation Center (HPCIC) in the Livermore Valley Open Campus (LVOC). The ultimate goal of this work was to help advance the State of California and U.S. commercial competitiveness in the arena of High Performance Computing (HPC) by accelerating the adoption of computational science solutions, consistent with recent DOE strategy directives. The desired result of this CRADA was a well-researched,more » carefully analyzed market evaluation that would identify those firms in core sectors of the US economy seeking to adopt or expand their use of HPC to become more competitive globally, and to define how those firms could be helped by the HPCIC with IBM as an integral partner.« less
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case
Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.
Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
System-Level Virtualization for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vallee, Geoffroy R; Naughton, III, Thomas J; Engelmann, Christian
2008-01-01
System-level virtualization has been a research topic since the 70's but regained popularity during the past few years because of the availability of efficient solution such as Xen and the implementation of hardware support in commodity processors (e.g. Intel-VT, AMD-V). However, a majority of system-level virtualization projects is guided by the server consolidation market. As a result, current virtualization solutions appear to not be suitable for high performance computing (HPC) which is typically based on large-scale systems. On another hand there is significant interest in exploiting virtual machines (VMs) within HPC for a number of other reasons. By virtualizing themore » machine, one is able to run a variety of operating systems and environments as needed by the applications. Virtualization allows users to isolate workloads, improving security and reliability. It is also possible to support non-native environments and/or legacy operating environments through virtualization. In addition, it is possible to balance work loads, use migration techniques to relocate applications from failing machines, and isolate fault systems for repair. This document presents the challenges for the implementation of a system-level virtualization solution for HPC. It also presents a brief survey of the different approaches and techniques to address these challenges.« less
Numerical Solutions for a Cylindrical Laser Diffuser Flowfield
1990-06-01
exhaust conditions with minimum losses to optimize performance of the system. Thus, the handling of the system of shock waves to decelerate the flow...requirement for exhaustive experimental work will result in significant savings of both time and resources. As more advanced computers are developed, the...Mach number (ɚ.5) flows. Recent interest in hypersonic engine inlet performance has resulted in an extension of the methodology to high Mach number
A mixed analog/digital chaotic neuro-computer system for quadratic assignment problems.
Horio, Yoshihiko; Ikeguchi, Tohru; Aihara, Kazuyuki
2005-01-01
We construct a mixed analog/digital chaotic neuro-computer prototype system for quadratic assignment problems (QAPs). The QAP is one of the difficult NP-hard problems, and includes several real-world applications. Chaotic neural networks have been used to solve combinatorial optimization problems through chaotic search dynamics, which efficiently searches optimal or near optimal solutions. However, preliminary experiments have shown that, although it obtained good feasible solutions, the Hopfield-type chaotic neuro-computer hardware system could not obtain the optimal solution of the QAP. Therefore, in the present study, we improve the system performance by adopting a solution construction method, which constructs a feasible solution using the analog internal state values of the chaotic neurons at each iteration. In order to include the construction method into our hardware, we install a multi-channel analog-to-digital conversion system to observe the internal states of the chaotic neurons. We show experimentally that a great improvement in the system performance over the original Hopfield-type chaotic neuro-computer is obtained. That is, we obtain the optimal solution for the size-10 QAP in less than 1000 iterations. In addition, we propose a guideline for parameter tuning of the chaotic neuro-computer system according to the observation of the internal states of several chaotic neurons in the network.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Opportunities for nonvolatile memory systems in extreme-scale high-performance computing
Vetter, Jeffrey S.; Mittal, Sparsh
2015-01-12
For extreme-scale high-performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where DRAM main memory systems account for about 30 to 50 percent of a node's overall power consumption. As the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory capacities balanced with increasing computational rates offered by next-generation processors. However, several emerging memory technologies related to nonvolatile memory (NVM) devices are being investigated as an alternative for DRAM. Moving forward, NVM devices could offer solutions for HPC architectures. Researchers are investigating how to integratemore » these emerging technologies into future extreme-scale HPC systems and how to expose these capabilities in the software stack and applications. In addition, current results show several of these strategies could offer high-bandwidth I/O, larger main memory capacities, persistent data structures, and new approaches for application resilience and output postprocessing, such as transaction-based incremental checkpointing and in situ visualization, respectively.« less
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
NASA Astrophysics Data System (ADS)
Chen, Zuojing; Polizzi, Eric
2010-11-01
Effective modeling and numerical spectral-based propagation schemes are proposed for addressing the challenges in time-dependent quantum simulations of systems ranging from atoms, molecules, and nanostructures to emerging nanoelectronic devices. While time-dependent Hamiltonian problems can be formally solved by propagating the solutions along tiny simulation time steps, a direct numerical treatment is often considered too computationally demanding. In this paper, however, we propose to go beyond these limitations by introducing high-performance numerical propagation schemes to compute the solution of the time-ordered evolution operator. In addition to the direct Hamiltonian diagonalizations that can be efficiently performed using the new eigenvalue solver FEAST, we have designed a Gaussian propagation scheme and a basis-transformed propagation scheme (BTPS) which allow to reduce considerably the simulation times needed by time intervals. It is outlined that BTPS offers the best computational efficiency allowing new perspectives in time-dependent simulations. Finally, these numerical schemes are applied to study the ac response of a (5,5) carbon nanotube within a three-dimensional real-space mesh framework.
Galaxy CloudMan: delivering cloud compute clusters
2010-01-01
Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
NASA Technical Reports Server (NTRS)
Crook, Andrew J.; Delaney, Robert A.
1992-01-01
The computer program user's manual for the ADPACAPES (Advanced Ducted Propfan Analysis Code-Average Passage Engine Simulation) program is included. The objective of the computer program is development of a three-dimensional Euler/Navier-Stokes flow analysis for fan section/engine geometries containing multiple blade rows and multiple spanwise flow splitters. An existing procedure developed by Dr. J. J. Adamczyk and associates at the NASA Lewis Research Center was modified to accept multiple spanwise splitter geometries and simulate engine core conditions. The numerical solution is based upon a finite volume technique with a four stage Runge-Kutta time marching procedure. Multiple blade row solutions are based upon the average-passage system of equations. The numerical solutions are performed on an H-type grid system, with meshes meeting the requirement of maintaining a common axisymmetric mesh for each blade row grid. The analysis was run on several geometry configurations ranging from one to five blade rows and from one to four radial flow splitters. The efficiency of the solution procedure was shown to be the same as the original analysis.
ERDC MSRC Resource. High Performance Computing for the Warfighter. Spring 2006
2006-01-01
named Ruby, and the HP/Compaq SC45, named Emerald , continue to add their unique sparkle to the ERDC MSRC computer infrastructure. ERDC invited the...configuration on B-52H purchased additional memory for the login nodes so that this part of the solution process could be done as a preprocessing step. On...application and system services. Of the service nodes, 10 are login nodes and 23 are input/output (I/O) server nodes for the Lustre file system (i.e., the
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk
We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less
Nian, Qiong; Callahan, Michael; Saei, Mojib; Look, David; Efstathiadis, Harry; Bailey, John; Cheng, Gary J.
2015-01-01
A new method combining aqueous solution printing with UV Laser crystallization (UVLC) and post annealing is developed to deposit highly transparent and conductive Aluminum doped Zinc Oxide (AZO) films. This technique is able to rapidly produce large area AZO films with better structural and optoelectronic properties than most high vacuum deposition, suggesting a potential large-scale manufacturing technique. The optoelectronic performance improvement attributes to UVLC and forming gas annealing (FMG) induced grain boundary density decrease and electron traps passivation at grain boundaries. The physical model and computational simulation developed in this work could be applied to thermal treatment of many other metal oxide films. PMID:26515670
3D detectors with high space and time resolution
NASA Astrophysics Data System (ADS)
Loi, A.
2018-01-01
For future high luminosity LHC experiments it will be important to develop new detector systems with increased space and time resolution and also better radiation hardness in order to operate in high luminosity environment. A possible technology which could give such performances is 3D silicon detectors. This work explores the possibility of a pixel geometry by designing and simulating different solutions, using Sentaurus Tecnology Computer Aided Design (TCAD) as design and simulation tool, and analysing their performances. A key factor during the selection was the generated electric field and the carrier velocity inside the active area of the pixel.
Evaluation of SuperLU on multicore architectures
NASA Astrophysics Data System (ADS)
Li, X. S.
2008-07-01
The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We included both Pthreads and MPI implementations in this study and found that the Pthreads implementation consistently delivers good performance and that a left-looking algorithm is usually superior.
A Cloud-Based Simulation Architecture for Pandemic Influenza Simulation
Eriksson, Henrik; Raciti, Massimiliano; Basile, Maurizio; Cunsolo, Alessandro; Fröberg, Anders; Leifler, Ola; Ekberg, Joakim; Timpka, Toomas
2011-01-01
High-fidelity simulations of pandemic outbreaks are resource consuming. Cluster-based solutions have been suggested for executing such complex computations. We present a cloud-based simulation architecture that utilizes computing resources both locally available and dynamically rented online. The approach uses the Condor framework for job distribution and management of the Amazon Elastic Computing Cloud (EC2) as well as local resources. The architecture has a web-based user interface that allows users to monitor and control simulation execution. In a benchmark test, the best cost-adjusted performance was recorded for the EC2 H-CPU Medium instance, while a field trial showed that the job configuration had significant influence on the execution time and that the network capacity of the master node could become a bottleneck. We conclude that it is possible to develop a scalable simulation environment that uses cloud-based solutions, while providing an easy-to-use graphical user interface. PMID:22195089
A Computational Study of Shear Layer Receptivity
NASA Astrophysics Data System (ADS)
Barone, Matthew; Lele, Sanjiva
2002-11-01
The receptivity of two-dimensional, compressible shear layers to local and external excitation sources is examined using a computational approach. The family of base flows considered consists of a laminar supersonic stream separated from nearly quiescent fluid by a thin, rigid splitter plate with a rounded trailing edge. The linearized Euler and linearized Navier-Stokes equations are solved numerically in the frequency domain. The flow solver is based on a high order finite difference scheme, coupled with an overset mesh technique developed for computational aeroacoustics applications. Solutions are obtained for acoustic plane wave forcing near the most unstable shear layer frequency, and are compared to the existing low frequency theory. An adjoint formulation to the present problem is developed, and adjoint equation calculations are performed using the same numerical methods as for the regular equation sets. Solutions to the adjoint equations are used to shed light on the mechanisms which control the receptivity of finite-width compressible shear layers.
The WorkPlace distributed processing environment
NASA Technical Reports Server (NTRS)
Ames, Troy; Henderson, Scott
1993-01-01
Real time control problems require robust, high performance solutions. Distributed computing can offer high performance through parallelism and robustness through redundancy. Unfortunately, implementing distributed systems with these characteristics places a significant burden on the applications programmers. Goddard Code 522 has developed WorkPlace to alleviate this burden. WorkPlace is a small, portable, embeddable network interface which automates message routing, failure detection, and re-configuration in response to failures in distributed systems. This paper describes the design and use of WorkPlace, and its application in the construction of a distributed blackboard system.
Parallel Climate Data Assimilation PSAS Package
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.
NASA Astrophysics Data System (ADS)
Huang, Qian
2014-09-01
Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
Formal Solutions for Polarized Radiative Transfer. II. High-order Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janett, Gioele; Steiner, Oskar; Belluzzi, Luca, E-mail: gioele.janett@irsol.ch
When integrating the radiative transfer equation for polarized light, the necessity of high-order numerical methods is well known. In fact, well-performing high-order formal solvers enable higher accuracy and the use of coarser spatial grids. Aiming to provide a clear comparison between formal solvers, this work presents different high-order numerical schemes and applies the systematic analysis proposed by Janett et al., emphasizing their advantages and drawbacks in terms of order of accuracy, stability, and computational cost.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-01-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272
Numerical simulation of air hypersonic flows with equilibrium chemical reactions
NASA Astrophysics Data System (ADS)
Emelyanov, Vladislav; Karpenko, Anton; Volkov, Konstantin
2018-05-01
The finite volume method is applied to solve unsteady three-dimensional compressible Navier-Stokes equations on unstructured meshes. High-temperature gas effects altering the aerodynamics of vehicles are taken into account. Possibilities of the use of graphics processor units (GPUs) for the simulation of hypersonic flows are demonstrated. Solutions of some test cases on GPUs are reported, and a comparison between computational results of equilibrium chemically reacting and perfect air flowfields is performed. Speedup of solution on GPUs with respect to the solution on central processor units (CPUs) is compared. The results obtained provide promising perspective for designing a GPU-based software framework for practical applications.
Point and path performance of light aircraft: A review and analysis
NASA Technical Reports Server (NTRS)
Smetana, F. O.; Summey, D. C.; Johnson, W. D.
1973-01-01
The literature on methods for predicting the performance of light aircraft is reviewed. The methods discussed in the review extend from the classical instantaneous maximum or minimum technique to techniques for generating mathematically optimum flight paths. Classical point performance techniques are shown to be adequate in many cases but their accuracies are compromised by the need to use simple lift, drag, and thrust relations in order to get closed form solutions. Also the investigation of the effect of changes in weight, altitude, configuration, etc. involves many essentially repetitive calculations. Accordingly, computer programs are provided which can fit arbitrary drag polars and power curves with very high precision and which can then use the resulting fits to compute the performance under the assumption that the aircraft is not accelerating.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
NASA Technical Reports Server (NTRS)
Luthcke, Scott B.; Zelensky, Nikita P.; Rowlands, David D.; Lemoine, Frank G.; Williams, Teresa A.
2003-01-01
Jason-1, launched on December 7, 2001, is continuing the time series of centimeter level ocean topography observations as the follow-on to the highly successful TOPEX/POSEIDON (T/P) radar altimeter satellite. The precision orbit determination (POD) is a critical component to meeting the ocean topography goals of the mission. Jason-1 is no exception and has set a 1 cm radial orbit accuracy goal, which represents a factor of two improvement over what is currently being achieved for T/P. The challenge to precision orbit determination (POD) is both achieving the 1 cm radial orbit accuracy and evaluating and validating the performance of the 1 cm orbit. Fortunately, Jason-1 POD can rely on four independent tracking data types including near continuous tracking data from the dual frequency codeless BlackJack GPS receiver. In addition, to the enhanced GPS receiver, Jason-1 carries significantly improved SLR and DORIS tracking systems along with the altimeter itself. We demonstrate the 1 cm radial orbit accuracy goal has been achieved using GPS data alone in a reduced dynamic solution. It is also shown that adding SLR data to the GPS-based solutions improves the orbits even further. In order to assess the performance of these orbits it is necessary to process all of the available tracking data (GPS, SLR, DORIS and altimeter crossover differences) as either dependent or independent of the orbit solutions. It was also necessary to compute orbit solutions using various combinations of the four available tracking data in order to independently assess the orbit performance. Towards this end, we have greatly improved orbits determined solely from SLR+DORIS data by applying the reduced dynamic solution strategy. In addition, we have computed reduced dynamic orbits based on SLR, DORIS and crossover data that are a significant improvement over the SLR and DORIS based dynamic solutions. These solutions provide the best performing orbits for independent validation of the GPS-based reduced dynamic orbits.
Enhanced Conformational Sampling of N-Glycans in Solution with Replica State Exchange Metadynamics.
Galvelis, Raimondas; Re, Suyong; Sugita, Yuji
2017-05-09
Molecular dynamics (MD) simulation of a N-glycan in solution is challenging because of high-energy barriers of the glycosidic linkages, functional group rotational barriers, and numerous intra- and intermolecular hydrogen bonds. In this study, we apply different enhanced conformational sampling approaches, namely, metadynamics (MTD), the replica-exchange MD (REMD), and the recently proposed replica state exchange MTD (RSE-MTD), to a N-glycan in solution and compare the conformational sampling efficiencies of the approaches. MTD helps to cross the high-energy barrier along the ω angle by utilizing a bias potential, but it cannot enhance sampling of the other degrees of freedom. REMD ensures moderate-energy barrier crossings by exchanging temperatures between replicas, while it hardly crosses the barriers along ω. In contrast, RSE-MTD succeeds to cross the high-energy barrier along ω as well as to enhance sampling of the other degrees of freedom. We tested two RSE-MTD schemes: in one scheme, 64 replicas were simulated with the bias potential along ω at different temperatures, while simulations of four replicas were performed with the bias potentials for different CVs at 300 K. In both schemes, one unbiased replica at 300 K was included to compute conformational properties of the glycan. The conformational sampling of the former is better than the other enhanced sampling methods, while the latter shows reasonable performance without spending large computational resources. The latter scheme is likely to be useful when a N-glycan-attached protein is simulated.
Shortest path problem on a grid network with unordered intermediate points
NASA Astrophysics Data System (ADS)
Saw, Veekeong; Rahman, Amirah; Eng Ong, Wen
2017-10-01
We consider a shortest path problem with single cost factor on a grid network with unordered intermediate points. A two stage heuristic algorithm is proposed to find a feasible solution path within a reasonable amount of time. To evaluate the performance of the proposed algorithm, computational experiments are performed on grid maps of varying size and number of intermediate points. Preliminary results for the problem are reported. Numerical comparisons against brute forcing show that the proposed algorithm consistently yields solutions that are within 10% of the optimal solution and uses significantly less computation time.
Challenges and opportunities of cloud computing for atmospheric sciences
NASA Astrophysics Data System (ADS)
Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.
2016-04-01
Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Orion Service Module Reaction Control System Plume Impingement Analysis Using PLIMP/RAMP2
NASA Technical Reports Server (NTRS)
Wang, Xiao-Yen; Lumpkin, Forrest E., III; Gati, Frank; Yuko, James R.; Motil, Brian J.
2009-01-01
The Orion Crew Exploration Vehicle Service Module Reaction Control System engine plume impingement was computed using the plume impingement program (PLIMP). PLIMP uses the plume solution from RAMP2, which is the refined version of the reacting and multiphase program (RAMP) code. The heating rate and pressure (force and moment) on surfaces or components of the Service Module were computed. The RAMP2 solution of the flow field inside the engine and the plume was compared with those computed using GASP, a computational fluid dynamics code, showing reasonable agreement. The computed heating rate and pressure using PLIMP were compared with the Reaction Control System plume model (RPM) solution and the plume impingement dynamics (PIDYN) solution. RPM uses the GASP-based plume solution, whereas PIDYN uses the SCARF plume solution. Three sets of the heating rate and pressure solutions agree well. Further thermal analysis on the avionic ring of the Service Module was performed using MSC Patran/Pthermal. The obtained temperature results showed that thermal protection is necessary because of significant heating from the plume.
MPACT Standard Input User s Manual, Version 2.2.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, Benjamin S.; Downar, Thomas; Fitzgerald, Andrew
The MPACT (Michigan PArallel Charactistics based Transport) code is designed to perform high-fidelity light water reactor (LWR) analysis using whole-core pin-resolved neutron transport calculations on modern parallel-computing hardware. The code consists of several libraries which provide the functionality necessary to solve steady-state eigenvalue problems. Several transport capabilities are available within MPACT including both 2-D and 3-D Method of Characteristics (MOC). A three-dimensional whole core solution based on the 2D-1D solution method provides the capability for full core depletion calculations.
Yu, Lei; Pizio, Benjamin S; Vaden, Timothy D
2012-06-07
Protic ionic liquids (PILs) are promising alternatives to water for swelling Nafion as a fuel cell proton exchange membrane (PEM). PILs can significantly improve the high-temperature performance of a PEM. The proton dissociation and solvation mechanisms in a PIL, which are keys to understanding the proton transportation and conductivity, have not been fully explored. In this paper, we used FTIR, Raman, and electronic spectroscopy with computational simulation techniques to explore the spectroscopic properties of bis(trifluoromethanesulfonyl)imide (HTFSI) solutions in 1-butyl-3-methylimidazolium bis(trifluoromethanesulfonyl)imide (BMITFSI) ionic liquid at concentrations from ∼0.1 to as high as ∼1.0 M. Solution conductivities were measured at room temperature and elevated temperatures up to ∼65 °C. The solution structure and properties depend on the concentration of HTFSI. At lower concentration, around 0.1 M, the HTFSI solution has higher conductivity than pure BMITFSI. However, the conductivity decreases when the concentration increases from 0.1 to 1.0 M. Temperature-dependent conductivities followed the Vogel-Fulcher-Tamman equation at all concentrations. Conductivity and spectroscopy results elucidate the complicated ionization and solvation mechanism of HTFSI in BMITFSI solutions. Raman spectroscopy and density functional theory (DFT) calculations are consistent with the complete ionization of HTFSI to generate solvated H(+) at low concentrations. FTIR, Raman, and electronic spectroscopic results as well as DFT computational simulation indicated that when the concentration is as high as 1.0 M, a significant amount of TFSI(-) is protonated, most likely at the imide nitrogen.
Fast globally optimal segmentation of cells in fluorescence microscopy images.
Bergeest, Jan-Philip; Rohr, Karl
2011-01-01
Accurate and efficient segmentation of cells in fluorescence microscopy images is of central importance for the quantification of protein expression in high-throughput screening applications. We propose a new approach for segmenting cell nuclei which is based on active contours and convex energy functionals. Compared to previous work, our approach determines the global solution. Thus, the approach does not suffer from local minima and the segmentation result does not depend on the initialization. We also suggest a numeric approach for efficiently computing the solution. The performance of our approach has been evaluated using fluorescence microscopy images of different cell types. We have also performed a quantitative comparison with previous segmentation approaches.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Tackling some of the most intricate geophysical challenges via high-performance computing
NASA Astrophysics Data System (ADS)
Khosronejad, A.
2016-12-01
Recently, world has been witnessing significant enhancements in computing power of supercomputers. Computer clusters in conjunction with the advanced mathematical algorithms has set the stage for developing and applying powerful numerical tools to tackle some of the most intricate geophysical challenges that today`s engineers face. One such challenge is to understand how turbulent flows, in real-world settings, interact with (a) rigid and/or mobile complex bed bathymetry of waterways and sea-beds in the coastal areas; (b) objects with complex geometry that are fully or partially immersed; and (c) free-surface of waterways and water surface waves in the coastal area. This understanding is especially important because the turbulent flows in real-world environments are often bounded by geometrically complex boundaries, which dynamically deform and give rise to multi-scale and multi-physics transport phenomena, and characterized by multi-lateral interactions among various phases (e.g. air/water/sediment phases). Herein, I present some of the multi-scale and multi-physics geophysical fluid mechanics processes that I have attempted to study using an in-house high-performance computational model, the so-called VFS-Geophysics. More specifically, I will present the simulation results of turbulence/sediment/solute/turbine interactions in real-world settings. Parts of the simulations I present are performed to gain scientific insights into the processes such as sand wave formation (A. Khosronejad, and F. Sotiropoulos, (2014), Numerical simulation of sand waves in a turbulent open channel flow, Journal of Fluid Mechanics, 753:150-216), while others are carried out to predict the effects of climate change and large flood events on societal infrastructures ( A. Khosronejad, et al., (2016), Large eddy simulation of turbulence and solute transport in a forested headwater stream, Journal of Geophysical Research:, doi: 10.1002/2014JF003423).
Solving the Coupled System Improves Computational Efficiency of the Bidomain Equations
Southern, James A.; Plank, Gernot; Vigmond, Edward J.; Whiteley, Jonathan P.
2017-01-01
The bidomain equations are frequently used to model the propagation of cardiac action potentials across cardiac tissue. At the whole organ level the size of the computational mesh required makes their solution a significant computational challenge. As the accuracy of the numerical solution cannot be compromised, efficiency of the solution technique is important to ensure that the results of the simulation can be obtained in a reasonable time whilst still encapsulating the complexities of the system. In an attempt to increase efficiency of the solver, the bidomain equations are often decoupled into one parabolic equation that is computationally very cheap to solve and an elliptic equation that is much more expensive to solve. In this study the performance of this uncoupled solution method is compared with an alternative strategy in which the bidomain equations are solved as a coupled system. This seems counter-intuitive as the alternative method requires the solution of a much larger linear system at each time step. However, in tests on two 3-D rabbit ventricle benchmarks it is shown that the coupled method is up to 80% faster than the conventional uncoupled method — and that parallel performance is better for the larger coupled problem. PMID:19457741
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
THE CLIMATIC AND HYDROLOGIC FACTORS AFFECTING THE REDISTRIBUTION OF SR-90
leaching solution present and the chemical and cation exchange properties of the soil solution ; a mathematical model of movement was established...manual for using high speed computers to compute the factors of the daily water balance was prepared; the influence of the soil solution in
Signal and image processing algorithm performance in a virtual and elastic computing environment
NASA Astrophysics Data System (ADS)
Bennett, Kelly W.; Robertson, James
2013-05-01
The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
NASA Technical Reports Server (NTRS)
Venkatachari, Balaji Shankar; Streett, Craig L.; Chang, Chau-Lyan; Friedlander, David J.; Wang, Xiao-Yen; Chang, Sin-Chung
2016-01-01
Despite decades of development of unstructured mesh methods, high-fidelity time-accurate simulations are still predominantly carried out on structured, or unstructured hexahedral meshes by using high-order finite-difference, weighted essentially non-oscillatory (WENO), or hybrid schemes formed by their combinations. In this work, the space-time conservation element solution element (CESE) method is used to simulate several flow problems including supersonic jet/shock interaction and its impact on launch vehicle acoustics, and direct numerical simulations of turbulent flows using tetrahedral meshes. This paper provides a status report for the continuing development of the space-time conservation element solution element (CESE) numerical and software framework under the Revolutionary Computational Aerosciences (RCA) project. Solution accuracy and large-scale parallel performance of the numerical framework is assessed with the goal of providing a viable paradigm for future high-fidelity flow physics simulations.
Event-by-Event Simulations of Early Gluon Fields in High Energy Nuclear Collisions
NASA Astrophysics Data System (ADS)
Nickel, Matthew; Rose, Steven; Fries, Rainer
2017-09-01
Collisions of heavy ions are carried out at ultra relativistic speeds at the Relativistic Heavy Ion Collider and the Large Hadron Collider to create Quark Gluon Plasma. The earliest stages of such collisions are dominated by the dynamics of classical gluon fields. The McLerran-Venugopalan (MV) model of color glass condensate provides a model for this process. Previous research has provided an analytic solution for event averaged observables in the MV model. Using the High Performance Research Computing Center (HPRC) at Texas A&M, we have developed a C++ code to explicitly calculate the initial gluon fields and energy momentum tensor event by event using the analytic recursive solution. The code has been tested against previously known analytic results up to fourth order. We have also have been able to test the convergence of the recursive solution at high orders in time and studied the time evolution of color glass condensate.
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, R.; Imbriale, W.; Liewer, P.; Lyons, J.; Manshadi, F.; Patterson, J.
1987-01-01
The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choi, Jun-Ho; Lim, Sohee; Chon, Bonghwan
The vibrational frequency, frequency fluctuation dynamics, and transition dipole moment of the O—D stretch mode of HDO molecule in aqueous solutions are strongly dependent on its local electrostatic environment and hydrogen-bond network structure. Therefore, the time-resolved vibrational spectroscopy the O—D stretch mode has been particularly used to investigate specific ion effects on water structure. Despite prolonged efforts to understand the interplay of O—D vibrational dynamics with local water hydrogen-bond network and ion aggregate structures in high salt solutions, still there exists a gap between theory and experiment due to a lack of quantitative model for accurately describing O—D stretch frequencymore » in high salt solutions. To fill this gap, we have performed numerical simulations of Raman scattering and IR absorption spectra of the O—D stretch mode of HDO in highly concentrated NaCl and KSCN solutions and compared them with experimental results. Carrying out extensive quantum chemistry calculations on not only water clusters but also ion-water clusters, we first developed a distributed vibrational solvatochromic charge model for the O—D stretch mode in aqueous salt solutions. Furthermore, the non-Condon effect on the vibrational transition dipole moment of the O—D stretch mode was fully taken into consideration with the charge response kernel that is non-local polarizability density. From the fluctuating O—D stretch mode frequencies and transition dipole vectors obtained from the molecular dynamics simulations, the O—D stretch Raman scattering and IR absorption spectra of HDO in salt solutions could be calculated. The polarization effect on the transition dipole vector of the O—D stretch mode is shown to be important and the asymmetric line shapes of the O—D stretch Raman scattering and IR absorption spectra of HDO especially in highly concentrated NaCl and KSCN solutions are in quantitative agreement with experimental results. We anticipate that this computational approach will be of critical use in interpreting linear and nonlinear vibrational spectroscopies of HDO molecule that is considered as an excellent local probe for monitoring local electrostatic and hydrogen-bonding environment in not just salt but also other confined and crowded solutions.« less
NASA Astrophysics Data System (ADS)
McClure, J. E.; Prins, J. F.; Miller, C. T.
2014-07-01
Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.
Methodology for CFD Design Analysis of National Launch System Nozzle Manifold
NASA Technical Reports Server (NTRS)
Haire, Scot L.
1993-01-01
The current design environment dictates that high technology CFD (Computational Fluid Dynamics) analysis produce quality results in a timely manner if it is to be integrated into the design process. The design methodology outlined describes the CFD analysis of an NLS (National Launch System) nozzle film cooling manifold. The objective of the analysis was to obtain a qualitative estimate for the flow distribution within the manifold. A complex, 3D, multiple zone, structured grid was generated from a 3D CAD file of the geometry. A Euler solution was computed with a fully implicit compressible flow solver. Post processing consisted of full 3D color graphics and mass averaged performance. The result was a qualitative CFD solution that provided the design team with relevant information concerning the flow distribution in and performance characteristics of the film cooling manifold within an effective time frame. Also, this design methodology was the foundation for a quick turnaround CFD analysis of the next iteration in the manifold design.
A High Performance COTS Based Computer Architecture
NASA Astrophysics Data System (ADS)
Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland
2014-08-01
Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
An Application Development Platform for Neuromorphic Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dean, Mark; Chan, Jason; Daffron, Christopher
2016-01-01
Dynamic Adaptive Neural Network Arrays (DANNAs) are neuromorphic computing systems developed as a hardware based approach to the implementation of neural networks. They feature highly adaptive and programmable structural elements, which model arti cial neural networks with spiking behavior. We design them to solve problems using evolutionary optimization. In this paper, we highlight the current hardware and software implementations of DANNA, including their features, functionalities and performance. We then describe the development of an Application Development Platform (ADP) to support efficient application implementation and testing of DANNA based solutions. We conclude with future directions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, Bryan Scott; MacQuigg, Michael Robert; Wysong, Andrew Russell
In this document, the code MCNP is validated with ENDF/B-VII.1 cross section data under the purview of ANSI/ANS-8.24-2007, for use with uranium systems. MCNP is a computer code based on Monte Carlo transport methods. While MCNP has wide reading capability in nuclear transport simulation, this validation is limited to the functionality related to neutron transport and calculation of criticality parameters such as k eff.
BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation
Kiefer, Christina; Fehlmann, Tobias; Backes, Christina
2017-01-01
Abstract Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02–95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee. PMID:28472498
A Comparative Study of High and Low Fidelity Fan Models for Turbofan Engine System Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Afjeh, Abdollah A.
1991-01-01
In this paper, a heterogeneous propulsion system simulation method is presented. The method is based on the formulation of a cycle model of a gas turbine engine. The model includes the nonlinear characteristics of the engine components via use of empirical data. The potential to simulate the entire engine operation on a computer without the aid of data is demonstrated by numerically generating "performance maps" for a fan component using two flow models of varying fidelity. The suitability of the fan models were evaluated by comparing the computed performance with experimental data. A discussion of the potential benefits and/or difficulties in connecting simulations solutions of differing fidelity is given.
NASA Technical Reports Server (NTRS)
Tam, Christopher; Krothapalli, A
1993-01-01
The research program for the first year of this project (see the original research proposal) consists of developing an explicit marching scheme for solving the parabolized stability equations (PSE). Performing mathematical analysis of the computational algorithm including numerical stability analysis and the determination of the proper boundary conditions needed at the boundary of the computation domain are implicit in the task. Before one can solve the parabolized stability equations for high-speed mixing layers, the mean flow must first be found. In the past, instability analysis of high-speed mixing layer has mostly been performed on mean flow profiles calculated by the boundary layer equations. In carrying out this project, it is believed that the boundary layer equations might not give an accurate enough nonparallel, nonlinear mean flow needed for parabolized stability analysis. A more accurate mean flow can, however, be found by solving the parabolized Navier-Stokes equations. The advantage of the parabolized Navier-Stokes equations is that its accuracy is consistent with the PSE method. Furthermore, the method of solution is similar. Hence, the major part of the effort of the work of this year has been devoted to the development of an explicit numerical marching scheme for the solution of the Parabolized Navier-Stokes equation as applied to the high-seed mixing layer problem.
Grigoryeva, Lyudmila; Henriques, Julie; Larger, Laurent; Ortega, Juan-Pablo
2014-07-01
Reservoir computing is a recently introduced machine learning paradigm that has already shown excellent performances in the processing of empirical data. We study a particular kind of reservoir computers called time-delay reservoirs that are constructed out of the sampling of the solution of a time-delay differential equation and show their good performance in the forecasting of the conditional covariances associated to multivariate discrete-time nonlinear stochastic processes of VEC-GARCH type as well as in the prediction of factual daily market realized volatilities computed with intraday quotes, using as training input daily log-return series of moderate size. We tackle some problems associated to the lack of task-universality for individually operating reservoirs and propose a solution based on the use of parallel arrays of time-delay reservoirs. Copyright © 2014 Elsevier Ltd. All rights reserved.
Final Report. Institute for Ultralscale Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Kwan-Liu; Galli, Giulia; Gygi, Francois
The SciDAC Institute for Ultrascale Visualization brought together leading experts from visualization, high-performance computing, and science application areas to make advanced visualization solutions for SciDAC scientists and the broader community. Over the five-year project, the Institute introduced many new enabling visualization techniques, which have significantly enhanced scientists’ ability to validate their simulations, interpret their data, and communicate with others about their work and findings. This Institute project involved a large number of junior and student researchers, who received the opportunities to work on some of the most challenging science applications and gain access to the most powerful high-performance computing facilitiesmore » in the world. They were readily trained and prepared for facing the greater challenges presented by extreme-scale computing. The Institute’s outreach efforts, through publications, workshops and tutorials, successfully disseminated the new knowledge and technologies to the SciDAC and the broader scientific communities. The scientific findings and experience of the Institute team helped plan the SciDAC3 program.« less
Design principles for radiation-resistant solid solutions
NASA Astrophysics Data System (ADS)
Schuler, Thomas; Trinkle, Dallas R.; Bellon, Pascal; Averback, Robert
2017-05-01
We develop a multiscale approach to quantify the increase in the recombined fraction of point defects under irradiation resulting from dilute solute additions to a solid solution. This methodology provides design principles for radiation-resistant materials. Using an existing database of solute diffusivities, we identify Sb as one of the most efficient solutes for this purpose in a Cu matrix. We perform density-functional-theory calculations to obtain binding and migration energies of Sb atoms, vacancies, and self-interstitial atoms in various configurations. The computed data informs the self-consistent mean-field formalism to calculate transport coefficients, allowing us to make quantitative predictions of the recombined fraction of point defects as a function of temperature and irradiation rate using homogeneous rate equations. We identify two different mechanisms according to which solutes lead to an increase in the recombined fraction of point defects; at low temperature, solutes slow down vacancies (kinetic effect), while at high temperature, solutes stabilize vacancies in the solid solution (thermodynamic effect). Extension to other metallic matrices and solutes are discussed.
Cloud archiving and data mining of High-Resolution Rapid Refresh forecast model output
NASA Astrophysics Data System (ADS)
Blaylock, Brian K.; Horel, John D.; Liston, Samuel T.
2017-12-01
Weather-related research often requires synthesizing vast amounts of data that need archival solutions that are both economical and viable during and past the lifetime of the project. Public cloud computing services (e.g., from Amazon, Microsoft, or Google) or private clouds managed by research institutions are providing object data storage systems potentially appropriate for long-term archives of such large geophysical data sets. We illustrate the use of a private cloud object store developed by the Center for High Performance Computing (CHPC) at the University of Utah. Since early 2015, we have been archiving thousands of two-dimensional gridded fields (each one containing over 1.9 million values over the contiguous United States) from the High-Resolution Rapid Refresh (HRRR) data assimilation and forecast modeling system. The archive is being used for retrospective analyses of meteorological conditions during high-impact weather events, assessing the accuracy of the HRRR forecasts, and providing initial and boundary conditions for research simulations. The archive is accessible interactively and through automated download procedures for researchers at other institutions that can be tailored by the user to extract individual two-dimensional grids from within the highly compressed files. Characteristics of the CHPC object storage system are summarized relative to network file system storage or tape storage solutions. The CHPC storage system is proving to be a scalable, reliable, extensible, affordable, and usable archive solution for our research.
ERIC Educational Resources Information Center
Kim, Hye Jeong; Pedersen, Susan
2010-01-01
Recently, the importance of ill-structured problem-solving in real-world contexts has become a focus of educational research. Particularly, the hypothesis-development process has been examined as one of the keys to developing a high-quality solution in a problem context. The authors of this study examined predictive relations between young…
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.
Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon
2012-01-01
We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Learning-based image preprocessing for robust computer-aided detection
NASA Astrophysics Data System (ADS)
Raghupathi, Laks; Devarakota, Pandu R.; Wolf, Matthias
2013-03-01
Recent studies have shown that low dose computed tomography (LDCT) can be an effective screening tool to reduce lung cancer mortality. Computer-aided detection (CAD) would be a beneficial second reader for radiologists in such cases. Studies demonstrate that while iterative reconstructions (IR) improve LDCT diagnostic quality, it however degrades CAD performance significantly (increased false positives) when applied directly. For improving CAD performance, solutions such as retraining with newer data or applying a standard preprocessing technique may not be suffice due to high prevalence of CT scanners and non-uniform acquisition protocols. Here, we present a learning-based framework that can adaptively transform a wide variety of input data to boost an existing CAD performance. This not only enhances their robustness but also their applicability in clinical workflows. Our solution consists of applying a suitable pre-processing filter automatically on the given image based on its characteristics. This requires the preparation of ground truth (GT) of choosing an appropriate filter resulting in improved CAD performance. Accordingly, we propose an efficient consolidation process with a novel metric. Using key anatomical landmarks, we then derive consistent feature descriptors for the classification scheme that then uses a priority mechanism to automatically choose an optimal preprocessing filter. We demonstrate CAD prototype∗ performance improvement using hospital-scale datasets acquired from North America, Europe and Asia. Though we demonstrated our results for a lung nodule CAD, this scheme is straightforward to extend to other post-processing tools dedicated to other organs and modalities.
Aerodynamic optimization by simultaneously updating flow variables and design parameters
NASA Technical Reports Server (NTRS)
Rizk, M. H.
1990-01-01
The application of conventional optimization schemes to aerodynamic design problems leads to inner-outer iterative procedures that are very costly. An alternative approach is presented based on the idea of updating the flow variable iterative solutions and the design parameter iterative solutions simultaneously. Two schemes based on this idea are applied to problems of correcting wind tunnel wall interference and optimizing advanced propeller designs. The first of these schemes is applicable to a limited class of two-design-parameter problems with an equality constraint. It requires the computation of a single flow solution. The second scheme is suitable for application to general aerodynamic problems. It requires the computation of several flow solutions in parallel. In both schemes, the design parameters are updated as the iterative flow solutions evolve. Computations are performed to test the schemes' efficiency, accuracy, and sensitivity to variations in the computational parameters.
NASA Technical Reports Server (NTRS)
Wang, R.; Demerdash, N. A.
1992-01-01
The combined magnetic vector potential - magnetic scalar potential method of computation of 3D magnetic fields by finite elements, introduced in a companion paper, in combination with state modeling in the abc-frame of reference, are used for global 3D magnetic field analysis and machine performance computation under rated load and overload condition in an example 14.3 kVA modified Lundell alternator. The results vividly demonstrate the 3D nature of the magnetic field in such machines, and show how this model can be used as an excellent tool for computation of flux density distributions, armature current and voltage waveform profiles and harmonic contents, as well as computation of torque profiles and ripples. Use of the model in gaining insight into locations of regions in the magnetic circuit with heavy degrees of saturation is demonstrated. Experimental results which correlate well with the simulations of the load case are given.
NASA Astrophysics Data System (ADS)
Crowell, Andrew Rippetoe
This dissertation describes model reduction techniques for the computation of aerodynamic heat flux and pressure loads for multi-disciplinary analysis of hypersonic vehicles. NASA and the Department of Defense have expressed renewed interest in the development of responsive, reusable hypersonic cruise vehicles capable of sustained high-speed flight and access to space. However, an extensive set of technical challenges have obstructed the development of such vehicles. These technical challenges are partially due to both the inability to accurately test scaled vehicles in wind tunnels and to the time intensive nature of high-fidelity computational modeling, particularly for the fluid using Computational Fluid Dynamics (CFD). The aim of this dissertation is to develop efficient and accurate models for the aerodynamic heat flux and pressure loads to replace the need for computationally expensive, high-fidelity CFD during coupled analysis. Furthermore, aerodynamic heating and pressure loads are systematically evaluated for a number of different operating conditions, including: simple two-dimensional flow over flat surfaces up to three-dimensional flows over deformed surfaces with shock-shock interaction and shock-boundary layer interaction. An additional focus of this dissertation is on the implementation and computation of results using the developed aerodynamic heating and pressure models in complex fluid-thermal-structural simulations. Model reduction is achieved using a two-pronged approach. One prong focuses on developing analytical corrections to isothermal, steady-state CFD flow solutions in order to capture flow effects associated with transient spatially-varying surface temperatures and surface pressures (e.g., surface deformation, surface vibration, shock impingements, etc.). The second prong is focused on minimizing the computational expense of computing the steady-state CFD solutions by developing an efficient surrogate CFD model. The developed two-pronged approach is found to exhibit balanced performance in terms of accuracy and computational expense, relative to several existing approaches. This approach enables CFD-based loads to be implemented into long duration fluid-thermal-structural simulations.
NASA Astrophysics Data System (ADS)
Traverso, A.; Lopez Torres, E.; Fantacci, M. E.; Cerello, P.
2017-05-01
Lung cancer is one of the most lethal types of cancer, because its early diagnosis is not good enough. In fact, the detection of pulmonary nodule, potential lung cancers, in Computed Tomography scans is a very challenging and time-consuming task for radiologists. To support radiologists, researchers have developed Computer-Aided Diagnosis (CAD) systems for the automated detection of pulmonary nodules in chest Computed Tomography scans. Despite the high level of technological developments and the proved benefits on the overall detection performance, the usage of Computer-Aided Diagnosis in clinical practice is far from being a common procedure. In this paper we investigate the causes underlying this discrepancy and present a solution to tackle it: the M5L WEB- and Cloud-based on-demand Computer-Aided Diagnosis. In addition, we prove how the combination of traditional imaging processing techniques with state-of-art advanced classification algorithms allows to build a system whose performance could be much larger than any Computer-Aided Diagnosis developed so far. This outcome opens the possibility to use the CAD as clinical decision support for radiologists.
A pertinent approach to solve nonlinear fuzzy integro-differential equations.
Narayanamoorthy, S; Sathiyapriya, S P
2016-01-01
Fuzzy integro-differential equations is one of the important parts of fuzzy analysis theory that holds theoretical as well as applicable values in analytical dynamics and so an appropriate computational algorithm to solve them is in essence. In this article, we use parametric forms of fuzzy numbers and suggest an applicable approach for solving nonlinear fuzzy integro-differential equations using homotopy perturbation method. A clear and detailed description of the proposed method is provided. Our main objective is to illustrate that the construction of appropriate convex homotopy in a proper way leads to highly accurate solutions with less computational work. The efficiency of the approximation technique is expressed via stability and convergence analysis so as to guarantee the efficiency and performance of the methodology. Numerical examples are demonstrated to verify the convergence and it reveals the validity of the presented numerical technique. Numerical results are tabulated and examined by comparing the obtained approximate solutions with the known exact solutions. Graphical representations of the exact and acquired approximate fuzzy solutions clarify the accuracy of the approach.
Adiabatic Quantum Computing via the Rydberg Blockade
NASA Astrophysics Data System (ADS)
Keating, Tyler; Goyal, Krittika; Deutsch, Ivan
2012-06-01
We study an architecture for implementing adiabatic quantum computation with trapped neutral atoms. Ground state atoms are dressed by laser fields in a manner conditional on the Rydberg blockade mechanism, thereby providing the requisite entangling interactions. As a benchmark we study the performance of a Quadratic Unconstrained Binary Optimization (QUBO) problem whose solution is found in the ground state spin configuration of an Ising-like model. We model a realistic architecture, including the effects of magnetic level structure, with qubits encoded into the clock states of ^133Cs, effective B-fields implemented through microwaves and light shifts, and atom-atom coupling achieved by excitation to a high-lying Rydberg level. Including the fundamental effects of photon scattering we find a high fidelity for the two-qubit implementation.
Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo
2014-01-01
The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072
The use of iohexol as oral contrast for computed tomography of the abdomen and pelvis.
Horton, Karen M; Fishman, Elliot K; Gayler, Bob
2008-01-01
Positive oral contrast agents (high-osmolar iodinated solutions [high-osmolar contrast medium] or barium sulfate suspensions) are used routinely for abdominal computed tomography. However, these agents are not ideal. Patients complain about the taste and, sometimes, refuse to drink the required quantity. Nausea, vomiting, and diarrhea are frequent. In certain clinical indications, either barium suspensions or high-osmolar contrast mediums may be contraindicated. This technical note describes the potential advantages of using low-osmolar iodinated solutions as an oral contrast agent for computed tomography.
Fan, Ming; Kuwahara, Hiroyuki; Wang, Xiaolei; Wang, Suojin; Gao, Xin
2015-11-01
Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.; Ovall, J.; Holst, M.
2014-12-01
We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
The Reduced Basis Method in Geosciences: Practical examples for numerical forward simulations
NASA Astrophysics Data System (ADS)
Degen, D.; Veroy, K.; Wellmann, F.
2017-12-01
Due to the highly heterogeneous character of the earth's subsurface, the complex coupling of thermal, hydrological, mechanical, and chemical processes, and the limited accessibility we have to face high-dimensional problems associated with high uncertainties in geosciences. Performing the obviously necessary uncertainty quantifications with a reasonable number of parameters is often not possible due to the high-dimensional character of the problem. Therefore, we are presenting the reduced basis (RB) method, being a model order reduction (MOR) technique, that constructs low-order approximations to, for instance, the finite element (FE) space. We use the RB method to address this computationally challenging simulations because this method significantly reduces the degrees of freedom. The RB method is decomposed into an offline and online stage, allowing to make the expensive pre-computations beforehand to get real-time results during field campaigns. Generally, the RB approach is most beneficial in the many-query and real-time context.We will illustrate the advantages of the RB method for the field of geosciences through two examples of numerical forward simulations.The first example is a geothermal conduction problem demonstrating the implementation of the RB method for a steady-state case. The second examples, a Darcy flow problem, shows the benefits for transient scenarios. In both cases, a quality evaluation of the approximations is given. Additionally, the runtimes for both the FE and the RB simulations are compared. We will emphasize the advantages of this method for repetitive simulations by showing the speed-up for the RB solution in contrast to the FE solution. Finally, we will demonstrate how the used implementation is usable in high-performance computing (HPC) infrastructures and evaluate its performance for such infrastructures. Hence, we will especially point out its scalability, yielding in an optimal usage on HPC infrastructures and normal working stations.
A Lightweight, High-performance I/O Management Package for Data-intensive Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jun Wang
2007-07-17
File storage systems are playing an increasingly important role in high-performance computing as the performance gap between CPU and disk increases. It could take a long time to develop an entire system from scratch. Solutions will have to be built as extensions to existing systems. If new portable, customized software components are plugged into these systems, better sustained high I/O performance and higher scalability will be achieved, and the development cycle of next-generation of parallel file systems will be shortened. The overall research objective of this ECPI development plan aims to develop a lightweight, customized, high-performance I/O management package namedmore » LightI/O to extend and leverage current parallel file systems used by DOE. During this period, We have developed a novel component in LightI/O and prototype them into PVFS2, and evaluate the resultant prototype—extended PVFS2 system on data-intensive applications. The preliminary results indicate the extended PVFS2 delivers better performance and reliability to users. A strong collaborative effort between the PI at the University of Nebraska Lincoln and the DOE collaborators—Drs Rob Ross and Rajeev Thakur at Argonne National Laboratory who are leading the PVFS2 group makes the project more promising.« less
Application of Open Source Technologies for Oceanographic Data Analysis
NASA Astrophysics Data System (ADS)
Huang, T.; Gangl, M.; Quach, N. T.; Wilson, B. D.; Chang, G.; Armstrong, E. M.; Chin, T. M.; Greguska, F.
2015-12-01
NEXUS is a data-intensive analysis solution developed with a new approach for handling science data that enables large-scale data analysis by leveraging open source technologies such as Apache Cassandra, Apache Spark, Apache Solr, and Webification. NEXUS has been selected to provide on-the-fly time-series and histogram generation for the Soil Moisture Active Passive (SMAP) mission for Level 2 and Level 3 Active, Passive, and Active Passive products. It also provides an on-the-fly data subsetting capability. NEXUS is designed to scale horizontally, enabling it to handle massive amounts of data in parallel. It takes a new approach on managing time and geo-referenced array data by dividing data artifacts into chunks and stores them in an industry-standard, horizontally scaled NoSQL database. This approach enables the development of scalable data analysis services that can infuse and leverage the elastic computing infrastructure of the Cloud. It is equipped with a high-performance geospatial and indexed data search solution, coupled with a high-performance data Webification solution free from file I/O bottlenecks, as well as a high-performance, in-memory data analysis engine. In this talk, we will focus on the recently funded AIST 2014 project by using NEXUS as the core for oceanographic anomaly detection service and web portal. We call it, OceanXtremes
Acharya, Sayantan; Nandi, Manoj K; Mandal, Arkajit; Sarkar, Sucharita; Bhattacharyya, Sarika Maitra
2015-08-27
We study the diffusion of small solute particles through solvent by keeping the solute-solvent interaction repulsive and varying the solvent properties. The study involves computer simulations, development of a new model to describe diffusion of small solutes in a solvent, and also mode coupling theory (MCT) calculations. In a viscous solvent, a small solute diffuses via coupling to the solvent hydrodynamic modes and also through the transient cages formed by the solvent. The model developed can estimate the independent contributions from these two different channels of diffusion. Although the solute diffusion in all the systems shows an amplification, the degree of it increases with solvent viscosity. The model correctly predicts that when the solvent viscosity is high, the solute primarily diffuses by exploiting the solvent cages. In such a scenario the MCT diffusion performed for a static solvent provides a correct estimation of the cage diffusion.
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel
2017-01-01
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997
Accelerating scientific computations with mixed precision algorithms
NASA Astrophysics Data System (ADS)
Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire
2009-12-01
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Manavski, Svetlin A; Valle, Giorgio
2008-01-01
Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches. PMID:18387198
Determination of elastic stresses in gas-turbine disks
NASA Technical Reports Server (NTRS)
Manson, S S
1947-01-01
A method is presented for the calculation of elastic stresses in symmetrical disks typical of those of a high-temperature gas turbine. The method is essentially a finite-difference solution of the equilibrium and compatibility equations for elastic stresses in a symmetrical disk. Account can be taken of point-to-point variations in disk thickness, in temperature, in elastic modulus, in coefficient of thermal expansion, in material density, and in Poisson's ratio. No numerical integration or trial-and-error procedures are involved and the computations can be performed in rapid and routine fashion by nontechnical computers with little engineering supervision. Checks on problems for which exact mathematical solutions are known indicate that the method yields results of high accuracy. Illustrative examples are presented to show the manner of treating solid disks, disks with central holes, and disks constructed either of a single material or two or more welded materials. The effect of shrink fitting is taken into account by a very simple device.
Design Trade-off Between Performance and Fault-Tolerance of Space Onboard Computers
NASA Astrophysics Data System (ADS)
Gorbunov, M. S.; Antonov, A. A.
2017-01-01
It is well known that there is a trade-off between performance and power consumption in onboard computers. The fault-tolerance is another important factor affecting performance, chip area and power consumption. Involving special SRAM cells and error-correcting codes is often too expensive with relation to the performance needed. We discuss the possibility of finding the optimal solutions for modern onboard computer for scientific apparatus focusing on multi-level cache memory design.
High-Performance Computing and Visualization | Energy Systems Integration
Facility | NREL High-Performance Computing and Visualization High-Performance Computing and Visualization High-performance computing (HPC) and visualization at NREL propel technology innovation as a . Capabilities High-Performance Computing NREL is home to Peregrine-the largest high-performance computing system
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan R.
2015-05-01
New radar applications need to perform complex algorithms and process large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression for real-time transceiver optimization are presented, they are based on a System-on-Chip architecture for Xilinx devices. This study also evaluates the performance of dedicated coprocessor as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through the high performance AXI buses, to perform floating-point operations, control the processing blocks, and communicate with external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band tested together with a low-cost channel emulator for different types of waveforms.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems
Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.
2012-01-01
As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
Ultrascale Visualization of Climate Data
NASA Technical Reports Server (NTRS)
Williams, Dean N.; Bremer, Timo; Doutriaux, Charles; Patchett, John; Williams, Sean; Shipman, Galen; Miller, Ross; Pugmire, David R.; Smith, Brian; Steed, Chad;
2013-01-01
Fueled by exponential increases in the computational and storage capabilities of high-performance computing platforms, climate simulations are evolving toward higher numerical fidelity, complexity, volume, and dimensionality. These technological breakthroughs are coming at a time of exponential growth in climate data, with estimates of hundreds of exabytes by 2020. To meet the challenges and exploit the opportunities that such explosive growth affords, a consortium of four national laboratories, two universities, a government agency, and two private companies formed to explore the next wave in climate science. Working in close collaboration with domain experts, the Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) project aims to provide high-level solutions to a variety of climate data analysis and visualization problems.
Contributions to HiLiftPW-3 Using Structured, Overset Grid Methods
NASA Technical Reports Server (NTRS)
Coder, James G.; Pulliam, Thomas H.; Jensen, James C.
2018-01-01
The High-Lift Common Research Model (HL-CRM) and the JAXA Standard Model (JSM) were analyzed computationally using both the OVERFLOW and LAVA codes for the third AIAA High-Lift Prediction Workshop. Geometry descriptions and the test cases simulated are described. With the HL-CRM, the effects of surface smoothness during grid projection and the effect of partially sealing a flap gap were studied. Grid refinement studies were performed at two angles of attack using both codes. For the JSM, simulations were performed with and without the nacelle/pylon. Without the nacelle/pylon, evidence of multiple solutions was observed when a quadratic constitutive relation is used in the turbulence modeling; however, using time-accurate simulation seemed to alleviate this issue. With the nacelle/pylon, no evidence of multiple solutions was observed. Laminar-turbulent transition modeling was applied to both JSM configuration, and had an overall favorable impact on the lift predictions.
NASA Technical Reports Server (NTRS)
Shishir, Pandya; Chaderjian, Neal; Ahmad, Jsaim; Kwak, Dochan (Technical Monitor)
2001-01-01
Flow simulations using the time-dependent Navier-Stokes equations remain a challenge for several reasons. Principal among them are the difficulty to accurately model complex flows, and the time needed to perform the computations. A parametric study of such complex problems is not considered practical due to the large cost associated with computing many time-dependent solutions. The computation time for each solution must be reduced in order to make a parametric study possible. With successful reduction of computation time, the issue of accuracy, and appropriateness of turbulence models will become more tractable.
NASA Astrophysics Data System (ADS)
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Plume-Free Stream Interaction Heating Effects During Orion Crew Module Reentry
NASA Technical Reports Server (NTRS)
Marichalar, J.; Lumpkin, F.; Boyles, K.
2012-01-01
During reentry of the Orion Crew Module (CM), vehicle attitude control will be performed by firing reaction control system (RCS) thrusters. Simulation of RCS plumes and their interaction with the oncoming flow has been difficult for the analysis community due to the large scarf angles of the RCS thrusters and the unsteady nature of the Orion capsule backshell environments. The model for the aerothermal database has thus relied on wind tunnel test data to capture the heating effects of thruster plume interactions with the freestream. These data are only valid for the continuum flow regime of the reentry trajectory. A Direct Simulation Monte Carlo (DSMC) analysis was performed to study the vehicle heating effects that result from the RCS thruster plume interaction with the oncoming freestream flow at high altitudes during Orion CM reentry. The study was performed with the DSMC Analysis Code (DAC). The inflow boundary conditions for the jets were obtained from Data Parallel Line Relaxation (DPLR) computational fluid dynamics (CFD) solutions. Simulations were performed for the roll, yaw, pitch-up and pitch-down jets at altitudes of 105 km, 125 km and 160 km as well as vacuum conditions. For comparison purposes (see Figure 1), the freestream conditions were based on previous DAC simulations performed without active RCS to populate the aerodynamic database for the Orion CM. Other inputs to the analysis included a constant Orbital reentry velocity of 7.5 km/s and angle of attack of 160 degrees. The results of the study showed that the interaction effects decrease quickly with increasing altitude. Also, jets with highly scarfed nozzles cause more severe heating compared to the nozzles with lower scarf angles. The difficulty of performing these simulations was based on the maximum number density and the ratio of number densities between the freestream and the plume for each simulation. The lowest altitude solutions required a substantial amount of computational resources (up to 1800 processors) to simulate approximately 2 billion molecules for the refined (adapted) solutions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Unat, Didem; Dubey, Anshu; Hoefler, Torsten
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity andmore » performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. Furthermore, this paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.« less
Barrier-breaking performance for industrial problems on the CRAY C916
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graffunder, S.K.
1993-12-31
Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
NASA Technical Reports Server (NTRS)
Kwak, Dochan; Kiris, C.; Smith, Charles A. (Technical Monitor)
1998-01-01
Performance of the two commonly used numerical procedures, one based on artificial compressibility method and the other pressure projection method, are compared. These formulations are selected primarily because they are designed for three-dimensional applications. The computational procedures are compared by obtaining steady state solutions of a wake vortex and unsteady solutions of a curved duct flow. For steady computations, artificial compressibility was very efficient in terms of computing time and robustness. For an unsteady flow which requires small physical time step, pressure projection method was found to be computationally more efficient than an artificial compressibility method. This comparison is intended to give some basis for selecting a method or a flow solution code for large three-dimensional applications where computing resources become a critical issue.
Soft control of scanning probe microscope with high flexibility.
Liu, Zhenghui; Guo, Yuzheng; Zhang, Zhaohui; Zhu, Xing
2007-01-01
Most commercial scanning probe microscopes have multiple embedded digital microprocessors and utilize complex software for system control, which is not easily obtained or modified by researchers wishing to perform novel and special applications. In this paper, we present a simple and flexible control solution that just depends on software running on a single-processor personal computer with real-time Linux operating system to carry out all the control tasks including negative feedback, tip moving, data processing and user interface. In this way, we fully exploit the potential of a personal computer in calculating and programming, enabling us to manipulate the scanning probe as required without any special digital control circuits and related technical know-how. This solution has been successfully applied to a homemade ultrahigh vacuum scanning tunneling microscope and a multiprobe scanning tunneling microscope.
NASA Astrophysics Data System (ADS)
Tyson, Eric J.; Buckley, James; Franklin, Mark A.; Chamberlain, Roger D.
2008-10-01
The imaging atmospheric Cherenkov technique for high-energy gamma-ray astronomy is emerging as an important new technique for studying the high energy universe. Current experiments have data rates of ≈20TB/year and duty cycles of about 10%. In the future, more sensitive experiments may produce up to 1000 TB/year. The data analysis task for these experiments requires keeping up with this data rate in close to real-time. Such data analysis is a classic example of a streaming application with very high performance requirements. This class of application often benefits greatly from the use of non-traditional approaches for computation including using special purpose hardware (FPGAs and ASICs), or sophisticated parallel processing techniques. However, designing, debugging, and deploying to these architectures is difficult and thus they are not widely used by the astrophysics community. This paper presents the Auto-Pipe design toolset that has been developed to address many of the difficulties in taking advantage of complex streaming computer architectures for such applications. Auto-Pipe incorporates a high-level coordination language, functional and performance simulation tools, and the ability to deploy applications to sophisticated architectures. Using the Auto-Pipe toolset, we have implemented the front-end portion of an imaging Cherenkov data analysis application, suitable for real-time or offline analysis. The application operates on data from the VERITAS experiment, and shows how Auto-Pipe can greatly ease performance optimization and application deployment of a wide variety of platforms. We demonstrate a performance improvement over a traditional software approach of 32x using an FPGA solution and 3.6x using a multiprocessor based solution.
Rapid Disaster Damage Estimation
NASA Astrophysics Data System (ADS)
Vu, T. T.
2012-07-01
The experiences from recent disaster events showed that detailed information derived from high-resolution satellite images could accommodate the requirements from damage analysts and disaster management practitioners. Richer information contained in such high-resolution images, however, increases the complexity of image analysis. As a result, few image analysis solutions can be practically used under time pressure in the context of post-disaster and emergency responses. To fill the gap in employment of remote sensing in disaster response, this research develops a rapid high-resolution satellite mapping solution built upon a dual-scale contextual framework to support damage estimation after a catastrophe. The target objects are building (or building blocks) and their condition. On the coarse processing level, statistical region merging deployed to group pixels into a number of coarse clusters. Based on majority rule of vegetation index, water and shadow index, it is possible to eliminate the irrelevant clusters. The remaining clusters likely consist of building structures and others. On the fine processing level details, within each considering clusters, smaller objects are formed using morphological analysis. Numerous indicators including spectral, textural and shape indices are computed to be used in a rule-based object classification. Computation time of raster-based analysis highly depends on the image size or number of processed pixels in order words. Breaking into 2 level processing helps to reduce the processed number of pixels and the redundancy of processing irrelevant information. In addition, it allows a data- and tasks- based parallel implementation. The performance is demonstrated with QuickBird images captured a disaster-affected area of Phanga, Thailand by the 2004 Indian Ocean tsunami are used for demonstration of the performance. The developed solution will be implemented in different platforms as well as a web processing service for operational uses.
NASA Astrophysics Data System (ADS)
Petit, C.; Le Louarn, M.; Fusco, T.; Madec, P.-Y.
2011-09-01
Various tomographic control solutions have been proposed during the last decades to ensure efficient or even optimal closed-loop correction to tomographic Adaptive Optics (AO) concepts such as Laser Tomographic AO (LTAO), Multi-Conjugate AO (MCAO). The optimal solution, based on Linear Quadratic Gaussian (LQG) approach, as well as suboptimal but efficient solutions such as Pseudo-Open Loop Control (POLC) require multiple Matrix Vector Multiplications (MVM). Disregarding their respective performance, these efficient control solutions thus exhibit strong increase of on-line complexity and their implementation may become difficult in demanding cases. Among them, two cases are of particular interest. First, the system Real-Time Computer architecture and implementation is derived from past or present solutions and does not support multiple MVM. This is the case of the AO Facility which RTC architecture is derived from the SPARTA platform and inherits its simple MVM architecture, which does not fit with LTAO control solutions for instance. Second, considering future systems such as Extremely Large Telescopes, the number of degrees of freedom is twenty to one hundred times bigger than present systems. In these conditions, tomographic control solutions can hardly be used in their standard form and optimized implementation shall be considered. Single MVM tomographic control solutions represent a potential solution, and straightforward solutions such as Virtual Deformable Mirrors have been already proposed for LTAO but with tuning issues. We investigate in this paper the possibility to derive from tomographic control solutions, such as POLC or LQG, simplified control solutions ensuring simple MVM architecture and that could be thus implemented on nowadays systems or future complex systems. We theoretically derive various solutions and analyze their respective performance on various systems thanks to numerical simulation. We discuss the optimization of their performance and stability issues with respect to classic control solutions. We finally discuss off-line computation and implementation constraints.
NASA Technical Reports Server (NTRS)
Bown, R. L.; Christofferson, A.; Lardas, M.; Flanders, H.
1980-01-01
A lambda matrix solution technique is being developed to perform an open loop frequency analysis of a high order dynamic system. The procedure evaluates the right and left latent vectors corresponding to the respective latent roots. The latent vectors are used to evaluate the partial fraction expansion formulation required to compute the flexible body open loop feedback gains for the Space Shuttle Digital Ascent Flight Control System. The algorithm is in the final stages of development and will be used to insure that the feedback gains meet the design specification.
Optimization and large scale computation of an entropy-based moment closure
NASA Astrophysics Data System (ADS)
Kristopher Garrett, C.; Hauck, Cory; Hill, Judith
2015-12-01
We present computational advances and results in the implementation of an entropy-based moment closure, MN, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as PN, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which are used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. These results show, in particular, load balancing issues in scaling the MN algorithm that do not appear for the PN algorithm. We also observe that in weak scaling tests, the ratio in time to solution of MN to PN decreases.
Optimization and large scale computation of an entropy-based moment closure
Hauck, Cory D.; Hill, Judith C.; Garrett, C. Kristopher
2015-09-10
We present computational advances and results in the implementation of an entropy-based moment closure, M N, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as P N, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which aremore » used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. Lastly, these results show, in particular, load balancing issues in scaling the M N algorithm that do not appear for the P N algorithm. We also observe that in weak scaling tests, the ratio in time to solution of M N to P N decreases.« less
Steady and Unsteady Nozzle Simulations Using the Conservation Element and Solution Element Method
NASA Technical Reports Server (NTRS)
Friedlander, David Joshua; Wang, Xiao-Yen J.
2014-01-01
This paper presents results from computational fluid dynamic (CFD) simulations of a three-stream plug nozzle. Time-accurate, Euler, quasi-1D and 2D-axisymmetric simulations were performed as part of an effort to provide a CFD-based approach to modeling nozzle dynamics. The CFD code used for the simulations is based on the space-time Conservation Element and Solution Element (CESE) method. Steady-state results were validated using the Wind-US code and a code utilizing the MacCormack method while the unsteady results were partially validated via an aeroacoustic benchmark problem. The CESE steady-state flow field solutions showed excellent agreement with solutions derived from the other methods and codes while preliminary unsteady results for the three-stream plug nozzle are also shown. Additionally, a study was performed to explore the sensitivity of gross thrust computations to the control surface definition. The results showed that most of the sensitivity while computing the gross thrust is attributed to the control surface stencil resolution and choice of stencil end points and not to the control surface definition itself.Finally, comparisons between the quasi-1D and 2D-axisymetric solutions were performed in order to gain insight on whether a quasi-1D solution can capture the steady and unsteady nozzle phenomena without the cost of a 2D-axisymmetric simulation. Initial results show that while the quasi-1D solutions are similar to the 2D-axisymmetric solutions, the inability of the quasi-1D simulations to predict two dimensional phenomena limits its accuracy.
Arctic Boreal Vulnerability Experiment (ABoVE) Science Cloud
NASA Astrophysics Data System (ADS)
Duffy, D.; Schnase, J. L.; McInerney, M.; Webster, W. P.; Sinno, S.; Thompson, J. H.; Griffith, P. C.; Hoy, E.; Carroll, M.
2014-12-01
The effects of climate change are being revealed at alarming rates in the Arctic and Boreal regions of the planet. NASA's Terrestrial Ecology Program has launched a major field campaign to study these effects over the next 5 to 8 years. The Arctic Boreal Vulnerability Experiment (ABoVE) will challenge scientists to take measurements in the field, study remote observations, and even run models to better understand the impacts of a rapidly changing climate for areas of Alaska and western Canada. The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center (GSFC) has partnered with the Terrestrial Ecology Program to create a science cloud designed for this field campaign - the ABoVE Science Cloud. The cloud combines traditional high performance computing with emerging technologies to create an environment specifically designed for large-scale climate analytics. The ABoVE Science Cloud utilizes (1) virtualized high-speed InfiniBand networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive, science applications. At the center of the architecture is a large object storage environment, much like a traditional high-performance file system, that supports data proximal processing using technologies like MapReduce on a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an InfiniBand network. Virtual systems can be tailored to a specific scientist and provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems. In this talk, we will present the architectural components of the science cloud and examples of how it is being used to meet the needs of the ABoVE campaign. In our experience, the science cloud approach significantly lowers the barriers and risks to organizations that require high performance computing solutions and provides the NCCS with the agility required to meet our customers' rapidly increasing and evolving requirements.
Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus
2016-05-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
Multivariant function model generation
NASA Technical Reports Server (NTRS)
1974-01-01
The development of computer programs applicable to space vehicle guidance was conducted. The subjects discussed are as follows: (1) determination of optimum reentry trajectories, (2) development of equations for performance of trajectory computation, (3) vehicle control for fuel optimization, (4) development of equations for performance trajectory computations, (5) applications and solution of Hamilton-Jacobi equation, and (6) stresses in dome shaped shells with discontinuities at the apex.
A Computing Method for Sound Propagation Through a Nonuniform Jet Stream
NASA Technical Reports Server (NTRS)
Padula, S. L.; Liu, C. H.
1974-01-01
Understanding the principles of jet noise propagation is an essential ingredient of systematic noise reduction research. High speed computer methods offer a unique potential for dealing with complex real life physical systems whereas analytical solutions are restricted to sophisticated idealized models. The classical formulation of sound propagation through a jet flow was found to be inadequate for computer solutions and a more suitable approach was needed. Previous investigations selected the phase and amplitude of the acoustic pressure as dependent variables requiring the solution of a system of nonlinear algebraic equations. The nonlinearities complicated both the analysis and the computation. A reformulation of the convective wave equation in terms of a new set of dependent variables is developed with a special emphasis on its suitability for numerical solutions on fast computers. The technique is very attractive because the resulting equations are linear in nonwaving variables. The computer solution to such a linear system of algebraic equations may be obtained by well-defined and direct means which are conservative of computer time and storage space. Typical examples are illustrated and computational results are compared with available numerical and experimental data.
Flow visualization of CFD using graphics workstations
NASA Technical Reports Server (NTRS)
Lasinski, Thomas; Buning, Pieter; Choi, Diana; Rogers, Stuart; Bancroft, Gordon
1987-01-01
High performance graphics workstations are used to visualize the fluid flow dynamics obtained from supercomputer solutions of computational fluid dynamic programs. The visualizations can be done independently on the workstation or while the workstation is connected to the supercomputer in a distributed computing mode. In the distributed mode, the supercomputer interactively performs the computationally intensive graphics rendering tasks while the workstation performs the viewing tasks. A major advantage of the workstations is that the viewers can interactively change their viewing position while watching the dynamics of the flow fields. An overview of the computer hardware and software required to create these displays is presented. For complex scenes the workstation cannot create the displays fast enough for good motion analysis. For these cases, the animation sequences are recorded on video tape or 16 mm film a frame at a time and played back at the desired speed. The additional software and hardware required to create these video tapes or 16 mm movies are also described. Photographs illustrating current visualization techniques are discussed. Examples of the use of the workstations for flow visualization through animation are available on video tape.
NASA Astrophysics Data System (ADS)
Mei, Kai; Kopp, Felix K.; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Kirschke, Jan S.; Noël, Peter B.; Baum, Thomas
2017-03-01
The trabecular bone microstructure is a key to the early diagnosis and advanced therapy monitoring of osteoporosis. Regularly measuring bone microstructure with conventional multi-detector computer tomography (MDCT) would expose patients with a relatively high radiation dose. One possible solution to reduce exposure to patients is sampling fewer projection angles. This approach can be supported by advanced reconstruction algorithms, with their ability to achieve better image quality under reduced projection angles or high levels of noise. In this work, we investigated the performance of iterative reconstruction from sparse sampled projection data on trabecular bone microstructure in in-vivo MDCT scans of human spines. The computed MDCT images were evaluated by calculating bone microstructure parameters. We demonstrated that bone microstructure parameters were still computationally distinguishable when half or less of the radiation dose was employed.
A high-order spatial filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-04-01
A high-order spatial filter is developed for the spectral-element-method dynamical core on the cubed-sphere grid which employs the Gauss-Lobatto Lagrange interpolating polynomials (GLLIP) as orthogonal basis functions. The filter equation is the high-order Helmholtz equation which corresponds to the implicit time-differencing of a diffusion equation employing the high-order Laplacian. The Laplacian operator is discretized within a cell which is a building block of the cubed sphere grid and consists of the Gauss-Lobatto grid. When discretizing a high-order Laplacian, due to the requirement of C0 continuity along the cell boundaries the grid-points in neighboring cells should be used for the target cell: The number of neighboring cells is nearly quadratically proportional to the filter order. Discrete Helmholtz equation yields a huge-sized and highly sparse matrix equation whose size is N*N with N the number of total grid points on the globe. The number of nonzero entries is also almost in quadratic proportion to the filter order. Filtering is accomplished by solving the huge-matrix equation. While requiring a significant computing time, the solution of global matrix provides the filtered field free of discontinuity along the cell boundaries. To achieve the computational efficiency and the accuracy at the same time, the solution of the matrix equation was obtained by only accounting for the finite number of adjacent cells. This is called as a local-domain filter. It was shown that to remove the numerical noise near the grid-scale, inclusion of 5*5 cells for the local-domain filter was found sufficient, giving the same accuracy as that obtained by global domain solution while reducing the computing time to a considerably lower level. The high-order filter was evaluated using the standard test cases including the baroclinic instability of the zonal flow. Results indicated that the filter performs better on the removal of grid-scale numerical noises than the explicit high-order viscosity. It was also presented that the filter can be easily implemented on the distributed-memory parallel computers with a desirable scalability.
A highly parallel multigrid-like method for the solution of the Euler equations
NASA Technical Reports Server (NTRS)
Tuminaro, Ray S.
1989-01-01
We consider a highly parallel multigrid-like method for the solution of the two dimensional steady Euler equations. The new method, introduced as filtering multigrid, is similar to a standard multigrid scheme in that convergence on the finest grid is accelerated by iterations on coarser grids. In the filtering method, however, additional fine grid subproblems are processed concurrently with coarse grid computations to further accelerate convergence. These additional problems are obtained by splitting the residual into a smooth and an oscillatory component. The smooth component is then used to form a coarse grid problem (similar to standard multigrid) while the oscillatory component is used for a fine grid subproblem. The primary advantage in the filtering approach is that fewer iterations are required and that most of the additional work per iteration can be performed in parallel with the standard coarse grid computations. We generalize the filtering algorithm to a version suitable for nonlinear problems. We emphasize that this generalization is conceptually straight-forward and relatively easy to implement. In particular, no explicit linearization (e.g., formation of Jacobians) needs to be performed (similar to the FAS multigrid approach). We illustrate the nonlinear version by applying it to the Euler equations, and presenting numerical results. Finally, a performance evaluation is made based on execution time models and convergence information obtained from numerical experiments.
High-performance parallel analysis of coupled problems for aircraft propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.
1994-01-01
This research program deals with the application of high-performance computing methods for the analysis of complete jet engines. We have entitled this program by applying the two dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition, and solution capabilities were successfully tested. We then focused attention on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion that results from these structural displacements. This is treated by a new arbitrary Lagrangian-Eulerian (ALE) technique that models the fluid mesh motion as that of a fictitious mass-spring network. New partitioned analysis procedures to treat this coupled three-component problem are developed. These procedures involved delayed corrections and subcycling. Preliminary results on the stability, accuracy, and MPP computational efficiency are reported.
Evolutionary fuzzy modeling human diagnostic decisions.
Peña-Reyes, Carlos Andrés
2004-05-01
Fuzzy CoCo is a methodology, combining fuzzy logic and evolutionary computation, for constructing systems able to accurately predict the outcome of a human decision-making process, while providing an understandable explanation of the underlying reasoning. Fuzzy logic provides a formal framework for constructing systems exhibiting both good numeric performance (accuracy) and linguistic representation (interpretability). However, fuzzy modeling--meaning the construction of fuzzy systems--is an arduous task, demanding the identification of many parameters. To solve it, we use evolutionary computation techniques (specifically cooperative coevolution), which are widely used to search for adequate solutions in complex spaces. We have successfully applied the algorithm to model the decision processes involved in two breast cancer diagnostic problems, the WBCD problem and the Catalonia mammography interpretation problem, obtaining systems both of high performance and high interpretability. For the Catalonia problem, an evolved system was embedded within a Web-based tool-called COBRA-for aiding radiologists in mammography interpretation.
Porsa, Sina; Lin, Yi-Chung; Pandy, Marcus G
2016-08-01
The aim of this study was to compare the computational performances of two direct methods for solving large-scale, nonlinear, optimal control problems in human movement. Direct shooting and direct collocation were implemented on an 8-segment, 48-muscle model of the body (24 muscles on each side) to compute the optimal control solution for maximum-height jumping. Both algorithms were executed on a freely-available musculoskeletal modeling platform called OpenSim. Direct collocation converged to essentially the same optimal solution up to 249 times faster than direct shooting when the same initial guess was assumed (3.4 h of CPU time for direct collocation vs. 35.3 days for direct shooting). The model predictions were in good agreement with the time histories of joint angles, ground reaction forces and muscle activation patterns measured for subjects jumping to their maximum achievable heights. Both methods converged to essentially the same solution when started from the same initial guess, but computation time was sensitive to the initial guess assumed. Direct collocation demonstrates exceptional computational performance and is well suited to performing predictive simulations of movement using large-scale musculoskeletal models.
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Storaasli, Olaf O.
2004-01-01
Of several iterative and direct equation solvers evaluated previously for computations in aeroacoustics, the most promising was the NASA-developed General-Purpose Solver (winner of NASA's 1999 software of the year award). This paper presents detailed, single-processor statistics of the performance of this solver, which has been tailored and optimized for large-scale aeroacoustic computations. The statistics, compiled using an SGI ORIGIN 2000 computer with 12 Gb available memory (RAM) and eight available processors, are the central processing unit time, RAM requirements, and solution error. The equation solver is capable of solving 10 thousand complex unknowns in as little as 0.01 sec using 0.02 Gb RAM, and 8.4 million complex unknowns in slightly less than 3 hours using all 12 Gb. This latter solution is the largest aeroacoustics problem solved to date with this technique. The study was unable to detect any noticeable error in the solution, since noise levels predicted from these solution vectors are in excellent agreement with the noise levels computed from the exact solution. The equation solver provides a means for obtaining numerical solutions to aeroacoustics problems in three dimensions.
An efficient implementation of a high-order filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-03-01
A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
NASA Astrophysics Data System (ADS)
Chembuly, V. V. M. J. Satish; Voruganti, Hari Kumar
2018-04-01
Hyper redundant manipulators have a large number of degrees of freedom (DOF) than the required to perform a given task. Additional DOF of manipulators provide the flexibility to work in highly cluttered environment and in constrained workspaces. Inverse kinematics (IK) of hyper-redundant manipulators is complicated due to large number of DOF and these manipulators have multiple IK solutions. The redundancy gives a choice of selecting best solution out of multiple solutions based on certain criteria such as obstacle avoidance, singularity avoidance, joint limit avoidance and joint torque minimization. This paper focuses on IK solution and redundancy resolution of hyper-redundant manipulator using classical optimization approach. Joint positions are computed by optimizing various criteria for a serial hyper redundant manipulators while traversing different paths in the workspace. Several cases are addressed using this scheme to obtain the inverse kinematic solution while optimizing the criteria like obstacle avoidance, joint limit avoidance.
NASA Astrophysics Data System (ADS)
Jerez-Hanckes, Carlos; Pérez-Arancibia, Carlos; Turc, Catalin
2017-12-01
We present Nyström discretizations of multitrace/singletrace formulations and non-overlapping Domain Decomposition Methods (DDM) for the solution of Helmholtz transmission problems for bounded composite scatterers with piecewise constant material properties. We investigate the performance of DDM with both classical Robin and optimized transmission boundary conditions. The optimized transmission boundary conditions incorporate square root Fourier multiplier approximations of Dirichlet to Neumann operators. While the multitrace/singletrace formulations as well as the DDM that use classical Robin transmission conditions are not particularly well suited for Krylov subspace iterative solutions of high-contrast high-frequency Helmholtz transmission problems, we provide ample numerical evidence that DDM with optimized transmission conditions constitute efficient computational alternatives for these type of applications. In the case of large numbers of subdomains with different material properties, we show that the associated DDM linear system can be efficiently solved via hierarchical Schur complements elimination.
A non-local computational boundary condition for duct acoustics
NASA Technical Reports Server (NTRS)
Zorumski, William E.; Watson, Willie R.; Hodge, Steve L.
1994-01-01
A non-local boundary condition is formulated for acoustic waves in ducts without flow. The ducts are two dimensional with constant area, but with variable impedance wall lining. Extension of the formulation to three dimensional and variable area ducts is straightforward in principle, but requires significantly more computation. The boundary condition simulates a nonreflecting wave field in an infinite duct. It is implemented by a constant matrix operator which is applied at the boundary of the computational domain. An efficient computational solution scheme is developed which allows calculations for high frequencies and long duct lengths. This computational solution utilizes the boundary condition to limit the computational space while preserving the radiation boundary condition. The boundary condition is tested for several sources. It is demonstrated that the boundary condition can be applied close to the sound sources, rendering the computational domain small. Computational solutions with the new non-local boundary condition are shown to be consistent with the known solutions for nonreflecting wavefields in an infinite uniform duct.
GREEN SUPERCOMPUTING IN A DESKTOP BOX
DOE Office of Scientific and Technical Information (OSTI.GOV)
HSU, CHUNG-HSING; FENG, WU-CHUN; CHING, AVERY
2007-01-17
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicatedmore » computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.« less
Intelligent fuzzy approach for fast fractal image compression
NASA Astrophysics Data System (ADS)
Nodehi, Ali; Sulong, Ghazali; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah; Rehman, Amjad; Saba, Tanzila
2014-12-01
Fractal image compression (FIC) is recognized as a NP-hard problem, and it suffers from a high number of mean square error (MSE) computations. In this paper, a two-phase algorithm was proposed to reduce the MSE computation of FIC. In the first phase, based on edge property, range and domains are arranged. In the second one, imperialist competitive algorithm (ICA) is used according to the classified blocks. For maintaining the quality of the retrieved image and accelerating algorithm operation, we divided the solutions into two groups: developed countries and undeveloped countries. Simulations were carried out to evaluate the performance of the developed approach. Promising results thus achieved exhibit performance better than genetic algorithm (GA)-based and Full-search algorithms in terms of decreasing the number of MSE computations. The number of MSE computations was reduced by the proposed algorithm for 463 times faster compared to the Full-search algorithm, although the retrieved image quality did not have a considerable change.
High performance hybrid functional Petri net simulations of biological pathway models on CUDA.
Chalkidis, Georgios; Nagasaki, Masao; Miyano, Satoru
2011-01-01
Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7x for a model containing 1,600 cells.
Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers
NASA Astrophysics Data System (ADS)
Ratcliff, Laura E.; Degomme, A.; Flores-Livas, José A.; Goedecker, Stefan; Genovese, Luigi
2018-03-01
Performing high accuracy hybrid functional calculations for condensed matter systems containing a large number of atoms is at present computationally very demanding or even out of reach if high quality basis sets are used. We present a highly optimized multiple graphics processing unit implementation of the exact exchange operator which allows one to perform fast hybrid functional density-functional theory (DFT) calculations with systematic basis sets without additional approximations for up to a thousand atoms. With this method hybrid DFT calculations of high quality become accessible on state-of-the-art supercomputers within a time-to-solution that is of the same order of magnitude as traditional semilocal-GGA functionals. The method is implemented in a portable open-source library.
Green's function methods in heavy ion shielding
NASA Technical Reports Server (NTRS)
Wilson, John W.; Costen, Robert C.; Shinn, Judy L.; Badavi, Francis F.
1993-01-01
An analytic solution to the heavy ion transport in terms of Green's function is used to generate a highly efficient computer code for space applications. The efficiency of the computer code is accomplished by a nonperturbative technique extending Green's function over the solution domain. The computer code can also be applied to accelerator boundary conditions to allow code validation in laboratory experiments.
Applied Computational Fluid Dynamics at NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Holst, Terry L.; Kwak, Dochan (Technical Monitor)
1994-01-01
The field of Computational Fluid Dynamics (CFD) has advanced to the point where it can now be used for many applications in fluid mechanics research and aerospace vehicle design. A few applications being explored at NASA Ames Research Center will be presented and discussed. The examples presented will range in speed from hypersonic to low speed incompressible flow applications. Most of the results will be from numerical solutions of the Navier-Stokes or Euler equations in three space dimensions for general geometry applications. Computational results will be used to highlight the presentation as appropriate. Advances in computational facilities including those associated with NASA's CAS (Computational Aerosciences) Project of the Federal HPCC (High Performance Computing and Communications) Program will be discussed. Finally, opportunities for future research will be presented and discussed. All material will be taken from non-sensitive, previously-published and widely-disseminated work.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Squid - a simple bioinformatics grid.
Carvalho, Paulo C; Glória, Rafael V; de Miranda, Antonio B; Degrave, Wim M
2005-08-03
BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
NASA Astrophysics Data System (ADS)
Marzari, Nicola
The last 30 years have seen the steady and exhilarating development of powerful quantum-simulation engines for extended systems, dedicated to the solution of the Kohn-Sham equations of density-functional theory, often augmented by density-functional perturbation theory, many-body perturbation theory, time-dependent density-functional theory, dynamical mean-field theory, and quantum Monte Carlo. Their implementation on massively parallel architectures, now leveraging also GPUs and accelerators, has started a massive effort in the prediction from first principles of many or of complex materials properties, leading the way to the exascale through the combination of HPC (high-performance computing) and HTC (high-throughput computing). Challenges and opportunities abound: complementing hardware and software investments and design; developing the materials' informatics infrastructure needed to encode knowledge into complex protocols and workflows of calculations; managing and curating data; resisting the complacency that we have already reached the predictive accuracy needed for materials design, or a robust level of verification of the different quantum engines. In this talk I will provide an overview of these challenges, with the ultimate prize being the computational understanding, prediction, and design of properties and performance for novel or complex materials and devices.
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Frederick, J. M.
2016-12-01
In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
Zhang, Hanming; Wang, Linyuan; Yan, Bin; Li, Lei; Cai, Ailong; Hu, Guoen
2016-01-01
Total generalized variation (TGV)-based computed tomography (CT) image reconstruction, which utilizes high-order image derivatives, is superior to total variation-based methods in terms of the preservation of edge information and the suppression of unfavorable staircase effects. However, conventional TGV regularization employs l1-based form, which is not the most direct method for maximizing sparsity prior. In this study, we propose a total generalized p-variation (TGpV) regularization model to improve the sparsity exploitation of TGV and offer efficient solutions to few-view CT image reconstruction problems. To solve the nonconvex optimization problem of the TGpV minimization model, we then present an efficient iterative algorithm based on the alternating minimization of augmented Lagrangian function. All of the resulting subproblems decoupled by variable splitting admit explicit solutions by applying alternating minimization method and generalized p-shrinkage mapping. In addition, approximate solutions that can be easily performed and quickly calculated through fast Fourier transform are derived using the proximal point method to reduce the cost of inner subproblems. The accuracy and efficiency of the simulated and real data are qualitatively and quantitatively evaluated to validate the efficiency and feasibility of the proposed method. Overall, the proposed method exhibits reasonable performance and outperforms the original TGV-based method when applied to few-view problems.
Evaluation of a Multicore-Optimized Implementation for Tomographic Reconstruction
Agulleiro, Jose-Ignacio; Fernández, José Jesús
2012-01-01
Tomography allows elucidation of the three-dimensional structure of an object from a set of projection images. In life sciences, electron microscope tomography is providing invaluable information about the cell structure at a resolution of a few nanometres. Here, large images are required to combine wide fields of view with high resolution requirements. The computational complexity of the algorithms along with the large image size then turns tomographic reconstruction into a computationally demanding problem. Traditionally, high-performance computing techniques have been applied to cope with such demands on supercomputers, distributed systems and computer clusters. In the last few years, the trend has turned towards graphics processing units (GPUs). Here we present a detailed description and a thorough evaluation of an alternative approach that relies on exploitation of the power available in modern multicore computers. The combination of single-core code optimization, vector processing, multithreading and efficient disk I/O operations succeeds in providing fast tomographic reconstructions on standard computers. The approach turns out to be competitive with the fastest GPU-based solutions thus far. PMID:23139768
Zheng, Xiujuan; Wei, Wentao; Huang, Qiu; Song, Shaoli; Wan, Jieqing; Huang, Gang
2017-01-01
The objective and quantitative analysis of longitudinal single photon emission computed tomography (SPECT) images are significant for the treatment monitoring of brain disorders. Therefore, a computer aided analysis (CAA) method is introduced to extract a change-rate map (CRM) as a parametric image for quantifying the changes of regional cerebral blood flow (rCBF) in longitudinal SPECT brain images. The performances of the CAA-CRM approach in treatment monitoring are evaluated by the computer simulations and clinical applications. The results of computer simulations show that the derived CRMs have high similarities with their ground truths when the lesion size is larger than system spatial resolution and the change rate is higher than 20%. In clinical applications, the CAA-CRM approach is used to assess the treatment of 50 patients with brain ischemia. The results demonstrate that CAA-CRM approach has a 93.4% accuracy of recovered region's localization. Moreover, the quantitative indexes of recovered regions derived from CRM are all significantly different among the groups and highly correlated with the experienced clinical diagnosis. In conclusion, the proposed CAA-CRM approach provides a convenient solution to generate a parametric image and derive the quantitative indexes from the longitudinal SPECT brain images for treatment monitoring.
NASA Astrophysics Data System (ADS)
Rosi, Giuseppe; Scala, Ilaria; Nguyen, Vu-Hieu; Naili, Salah
2017-06-01
This article is about ultrasonic wave propagation in microstructured porous media. The classic Biot's model is enriched using a strain gradient approach to be able to capture high-order effects when the wavelength approaches the characteristic size of the microstructure. In order to reproduce actual transmission/reflection experiments performed on poroelastic samples, and to validate the choice of the model, the computation of the time domain response is necessary, as it allows for a direct comparison with experimental results. For obtaining the time response, we use two strategies: on the one hand we compute the closed form solution by using the Laplace and Fourier transforms techniques; on the other hand we used a finite element method. The results are presented for a transmission/reflection test performed on a poroelastic sample immersed in water. The effects introduced by the strain gradient terms are visible in the time response and in agreement with experimental observations. The results can be exploited in characterization of mechanical properties of poroelastic media by enhancing the reliability of quantitative ultrasound techniques.
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Parallel computing in enterprise modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priorimore » ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.« less
NASA Technical Reports Server (NTRS)
Palmer, Grant; Venkatapathy, Ethiraj
1993-01-01
Three solution algorithms, explicit underrelaxation, point implicit, and lower upper symmetric Gauss-Seidel (LUSGS), are used to compute nonequilibrium flow around the Apollo 4 return capsule at 62 km altitude. By varying the Mach number, the efficiency and robustness of the solution algorithms were tested for different levels of chemical stiffness. The performance of the solution algorithms degraded as the Mach number and stiffness of the flow increased. At Mach 15, 23, and 30, the LUSGS method produces an eight order of magnitude drop in the L2 norm of the energy residual in 1/3 to 1/2 the Cray C-90 computer time as compared to the point implicit and explicit under-relaxation methods. The explicit under-relaxation algorithm experienced convergence difficulties at Mach 23 and above. At Mach 40 the performance of the LUSGS algorithm deteriorates to the point it is out-performed by the point implicit method. The effects of the viscous terms are investigated. Grid dependency questions are explored.
Analytics of crystal growth in space
NASA Technical Reports Server (NTRS)
Wilcox, W. R.; Chang, C. E.; Shlichta, P. J.; Chen, P. S.; Kim, C. K.
1974-01-01
Two crystal growth processes considered for spacelab experiments were studied to anticipate and understand phenomena not ordinarily encountered on earth. Computer calculations were performed on transport processes in floating zone melting and on growth of a crystal from solution in a spacecraft environment. Experiments intended to simulate solution growth at micro accelerations were performed.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Petrick, David J.; Day, John H. (Technical Monitor)
2001-01-01
Spacecraft telemetry rates have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image processing application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms and re-configurable computing hardware technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processing (DSP). It has been shown in [1] and [2] that this configuration can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft. However, since this technology is still maturing, intensive pre-hardware steps are necessary to achieve the benefits of hardware implementation. This paper describes these steps for the GOES-8 application, a software project developed using Interactive Data Language (IDL) (Trademark of Research Systems, Inc.) on a Workstation/UNIX platform. The solution involves converting the application to a PC/Windows/RC platform, selected mainly by the availability of low cost, adaptable high-speed RC hardware. In order for the hybrid system to run, the IDL software was modified to account for platform differences. It was interesting to examine the gains and losses in performance on the new platform, as well as unexpected observations before implementing hardware. After substantial pre-hardware optimization steps, the necessity of hardware implementation for bottleneck code in the PC environment became evident and solvable beginning with the methodology described in [1], [2], and implementing a novel methodology for this specific application [6]. The PC-RC interface bandwidth problem for the class of applications with moderate input-output data rates but large intermediate multi-thread data streams has been addressed and mitigated. This opens a new class of satellite image processing applications for bottleneck problems solution using RC technologies. The issue of a science algorithm level of abstraction necessary for RC hardware implementation is also described. Selected Matlab functions already implemented in hardware were investigated for their direct applicability to the GOES-8 application with the intent to create a library of Matlab and IDL RC functions for ongoing work. A complete class of spacecraft image processing applications using embedded re-configurable computing technology to meet real-time requirements, including performance results and comparison with the existing system, is described in this paper.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
NASA Astrophysics Data System (ADS)
Fecher, T.; Pail, R.; Gruber, T.
2017-05-01
GOCO05c is a gravity field model computed as a combined solution of a satellite-only model and a global data set of gravity anomalies. It is resolved up to degree and order 720. It is the first model applying regionally varying weighting. Since this causes strong correlations among all gravity field parameters, the resulting full normal equation system with a size of 2 TB had to be solved rigorously by applying high-performance computing. GOCO05c is the first combined gravity field model independent of EGM2008 that contains GOCE data of the whole mission period. The performance of GOCO05c is externally validated by GNSS-levelling comparisons, orbit tests, and computation of the mean dynamic topography, achieving at least the quality of existing high-resolution models. Results show that the additional GOCE information is highly beneficial in insufficiently observed areas, and that due to the weighting scheme of individual data the spectral and spatial consistency of the model is significantly improved. Due to usage of fill-in data in specific regions, the model cannot be used for physical interpretations in these regions.
NASA Astrophysics Data System (ADS)
Ramasahayam, Veda Krishna Vyas; Diwakar, Anant; Bodi, Kowsik
2017-11-01
To study the flow of high temperature air in vibrational and chemical equilibrium, accurate models for thermodynamic state and transport phenomena are required. In the present work, the performance of a state equation model and two mixing rules for determining equilibrium air thermodynamic and transport properties are compared with that of curve fits. The thermodynamic state model considers 11 species which computes flow chemistry by an iterative process and the mixing rules considered for viscosity are Wilke and Armaly-Sutton. The curve fits of Srinivasan, which are based on Grabau type transition functions, are chosen for comparison. A two-dimensional Navier-Stokes solver is developed to simulate high enthalpy flows with numerical fluxes computed by AUSM+-up. The accuracy of state equation model and curve fits for thermodynamic properties is determined using hypersonic inviscid flow over a circular cylinder. The performance of mixing rules and curve fits for viscosity are compared using hypersonic laminar boundary layer prediction on a flat plate. It is observed that steady state solutions from state equation model and curve fits match with each other. Though curve fits are significantly faster the state equation model is more general and can be adapted to any flow composition.
Aeroelastic optimization methodology for viscous and turbulent flows
NASA Astrophysics Data System (ADS)
Barcelos Junior, Manuel Nascimento Dias
2007-12-01
In recent years, the development of faster computers and parallel processing allowed the application of high-fidelity analysis methods to the aeroelastic design of aircraft. However, these methods are restricted to the final design verification, mainly due to the computational cost involved in iterative design processes. Therefore, this work is concerned with the creation of a robust and efficient aeroelastic optimization methodology for inviscid, viscous and turbulent flows by using high-fidelity analysis and sensitivity analysis techniques. Most of the research in aeroelastic optimization, for practical reasons, treat the aeroelastic system as a quasi-static inviscid problem. In this work, as a first step toward the creation of a more complete aeroelastic optimization methodology for realistic problems, an analytical sensitivity computation technique was developed and tested for quasi-static aeroelastic viscous and turbulent flow configurations. Viscous and turbulent effects are included by using an averaged discretization of the Navier-Stokes equations, coupled with an eddy viscosity turbulence model. For quasi-static aeroelastic problems, the traditional staggered solution strategy has unsatisfactory performance when applied to cases where there is a strong fluid-structure coupling. Consequently, this work also proposes a solution methodology for aeroelastic and sensitivity analyses of quasi-static problems, which is based on the fixed point of an iterative nonlinear block Gauss-Seidel scheme. The methodology can also be interpreted as the solution of the Schur complement of the aeroelastic and sensitivity analyses linearized systems of equations. The methodologies developed in this work are tested and verified by using realistic aeroelastic systems.
Acuña, Daniel E; Parada, Víctor
2010-07-29
Humans need to solve computationally intractable problems such as visual search, categorization, and simultaneous learning and acting, yet an increasing body of evidence suggests that their solutions to instantiations of these problems are near optimal. Computational complexity advances an explanation to this apparent paradox: (1) only a small portion of instances of such problems are actually hard, and (2) successful heuristics exploit structural properties of the typical instance to selectively improve parts that are likely to be sub-optimal. We hypothesize that these two ideas largely account for the good performance of humans on computationally hard problems. We tested part of this hypothesis by studying the solutions of 28 participants to 28 instances of the Euclidean Traveling Salesman Problem (TSP). Participants were provided feedback on the cost of their solutions and were allowed unlimited solution attempts (trials). We found a significant improvement between the first and last trials and that solutions are significantly different from random tours that follow the convex hull and do not have self-crossings. More importantly, we found that participants modified their current better solutions in such a way that edges belonging to the optimal solution ("good" edges) were significantly more likely to stay than other edges ("bad" edges), a hallmark of structural exploitation. We found, however, that more trials harmed the participants' ability to tell good from bad edges, suggesting that after too many trials the participants "ran out of ideas." In sum, we provide the first demonstration of significant performance improvement on the TSP under repetition and feedback and evidence that human problem-solving may exploit the structure of hard problems paralleling behavior of state-of-the-art heuristics.
Acuña, Daniel E.; Parada, Víctor
2010-01-01
Humans need to solve computationally intractable problems such as visual search, categorization, and simultaneous learning and acting, yet an increasing body of evidence suggests that their solutions to instantiations of these problems are near optimal. Computational complexity advances an explanation to this apparent paradox: (1) only a small portion of instances of such problems are actually hard, and (2) successful heuristics exploit structural properties of the typical instance to selectively improve parts that are likely to be sub-optimal. We hypothesize that these two ideas largely account for the good performance of humans on computationally hard problems. We tested part of this hypothesis by studying the solutions of 28 participants to 28 instances of the Euclidean Traveling Salesman Problem (TSP). Participants were provided feedback on the cost of their solutions and were allowed unlimited solution attempts (trials). We found a significant improvement between the first and last trials and that solutions are significantly different from random tours that follow the convex hull and do not have self-crossings. More importantly, we found that participants modified their current better solutions in such a way that edges belonging to the optimal solution (“good” edges) were significantly more likely to stay than other edges (“bad” edges), a hallmark of structural exploitation. We found, however, that more trials harmed the participants' ability to tell good from bad edges, suggesting that after too many trials the participants “ran out of ideas.” In sum, we provide the first demonstration of significant performance improvement on the TSP under repetition and feedback and evidence that human problem-solving may exploit the structure of hard problems paralleling behavior of state-of-the-art heuristics. PMID:20686597
High-Performance Computing Data Center | Energy Systems Integration
Facility | NREL High-Performance Computing Data Center High-Performance Computing Data Center The Energy Systems Integration Facility's High-Performance Computing Data Center is home to Peregrine -the largest high-performance computing system in the world exclusively dedicated to advancing
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalashnikova, Irina
2012-05-01
A numerical study aimed to evaluate different preconditioners within the Trilinos Ifpack and ML packages for the Quantum Computer Aided Design (QCAD) non-linear Poisson problem implemented within the Albany code base and posed on the Ottawa Flat 270 design geometry is performed. This study led to some new development of Albany that allows the user to select an ML preconditioner with Zoltan repartitioning based on nodal coordinates, which is summarized. Convergence of the numerical solutions computed within the QCAD computational suite with successive mesh refinement is examined in two metrics, the mean value of the solution (an L{sup 1} norm)more » and the field integral of the solution (L{sup 2} norm).« less
NASA Astrophysics Data System (ADS)
Vasilkin, Andrey
2018-03-01
The more designing solutions at the search stage for design for high-rise buildings can be synthesized by the engineer, the more likely that the final adopted version will be the most efficient and economical. However, in modern market conditions, taking into account the complexity and responsibility of high-rise buildings the designer does not have the necessary time to develop, analyze and compare any significant number of options. To solve this problem, it is expedient to use the high potential of computer-aided designing. To implement automated search for design solutions, it is proposed to develop the computing facilities, the application of which will significantly increase the productivity of the designer and reduce the complexity of designing. Methods of structural and parametric optimization have been adopted as the basis of the computing facilities. Their efficiency in the synthesis of design solutions is shown, also the schemes, that illustrate and explain the introduction of structural optimization in the traditional design of steel frames, are constructed. To solve the problem of synthesis and comparison of design solutions for steel frames, it is proposed to develop the computing facilities that significantly reduces the complexity of search designing and based on the use of methods of structural and parametric optimization.
NASA Astrophysics Data System (ADS)
Buddala, Raviteja; Mahapatra, Siba Sankar
2017-11-01
Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having `g' operations is performed on `g' operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem becomes a flexible flow shop problem (FFSP). FFSP which contains all the complexities involved in a simple flow shop and parallel machine scheduling problems is a well-known NP-hard (Non-deterministic polynomial time) problem. Owing to high computational complexity involved in solving these problems, it is not always possible to obtain an optimal solution in a reasonable computation time. To obtain near-optimal solutions in a reasonable computation time, a large variety of meta-heuristics have been proposed in the past. However, tuning algorithm-specific parameters for solving FFSP is rather tricky and time consuming. To address this limitation, teaching-learning-based optimization (TLBO) and JAYA algorithm are chosen for the study because these are not only recent meta-heuristics but they do not require tuning of algorithm-specific parameters. Although these algorithms seem to be elegant, they lose solution diversity after few iterations and get trapped at the local optima. To alleviate such drawback, a new local search procedure is proposed in this paper to improve the solution quality. Further, mutation strategy (inspired from genetic algorithm) is incorporated in the basic algorithm to maintain solution diversity in the population. Computational experiments have been conducted on standard benchmark problems to calculate makespan and computational time. It is found that the rate of convergence of TLBO is superior to JAYA. From the results, it is found that TLBO and JAYA outperform many algorithms reported in the literature and can be treated as efficient methods for solving the FFSP.
Computational reacting gas dynamics
NASA Technical Reports Server (NTRS)
Lam, S. H.
1993-01-01
In the study of high speed flows at high altitudes, such as that encountered by re-entry spacecrafts, the interaction of chemical reactions and other non-equilibrium processes in the flow field with the gas dynamics is crucial. Generally speaking, problems of this level of complexity must resort to numerical methods for solutions, using sophisticated computational fluid dynamics (CFD) codes. The difficulties introduced by reacting gas dynamics can be classified into three distinct headings: (1) the usually inadequate knowledge of the reaction rate coefficients in the non-equilibrium reaction system; (2) the vastly larger number of unknowns involved in the computation and the expected stiffness of the equations; and (3) the interpretation of the detailed reacting CFD numerical results. The research performed accepts the premise that reacting flows of practical interest in the future will in general be too complex or 'untractable' for traditional analytical developments. The power of modern computers must be exploited. However, instead of focusing solely on the construction of numerical solutions of full-model equations, attention is also directed to the 'derivation' of the simplified model from the given full-model. In other words, the present research aims to utilize computations to do tasks which have traditionally been done by skilled theoreticians: to reduce an originally complex full-model system into an approximate but otherwise equivalent simplified model system. The tacit assumption is that once the appropriate simplified model is derived, the interpretation of the detailed numerical reacting CFD numerical results will become much easier. The approach of the research is called computational singular perturbation (CSP).
Fireworks Algorithm with Enhanced Fireworks Interaction.
Zhang, Bei; Zheng, Yu-Jun; Zhang, Min-Xia; Chen, Sheng-Yong
2017-01-01
As a relatively new metaheuristic in swarm intelligence, fireworks algorithm (FWA) has exhibited promising performance on a wide range of optimization problems. This paper aims to improve FWA by enhancing fireworks interaction in three aspects: 1) Developing a new Gaussian mutation operator to make sparks learn from more exemplars; 2) Integrating the regular explosion operator of FWA with the migration operator of biogeography-based optimization (BBO) to increase information sharing; 3) Adopting a new population selection strategy that enables high-quality solutions to have high probabilities of entering the next generation without incurring high computational cost. The combination of the three strategies can significantly enhance fireworks interaction and thus improve solution diversity and suppress premature convergence. Numerical experiments on the CEC 2015 single-objective optimization test problems show the effectiveness of the proposed algorithm. The application to a high-speed train scheduling problem also demonstrates its feasibility in real-world optimization problems.
Hybrid, experimental and computational, investigation of mechanical components
NASA Astrophysics Data System (ADS)
Furlong, Cosme; Pryputniewicz, Ryszard J.
1996-07-01
Computational and experimental methodologies have unique features for the analysis and solution of a wide variety of engineering problems. Computations provide results that depend on selection of input parameters such as geometry, material constants, and boundary conditions which, for correct modeling purposes, have to be appropriately chosen. In addition, it is relatively easy to modify the input parameters in order to computationally investigate different conditions. Experiments provide solutions which characterize the actual behavior of the object of interest subjected to specific operating conditions. However, it is impractical to experimentally perform parametric investigations. This paper discusses the use of a hybrid, computational and experimental, approach for study and optimization of mechanical components. Computational techniques are used for modeling the behavior of the object of interest while it is experimentally tested using noninvasive optical techniques. Comparisons are performed through a fringe predictor program used to facilitate the correlation between both techniques. In addition, experimentally obtained quantitative information, such as displacements and shape, can be applied in the computational model in order to improve this correlation. The result is a validated computational model that can be used for performing quantitative analyses and structural optimization. Practical application of the hybrid approach is illustrated with a representative example which demonstrates the viability of the approach as an engineering tool for structural analysis and optimization.
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
NASA Astrophysics Data System (ADS)
Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.
2014-06-01
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Time-optimal Aircraft Pursuit-evasion with a Weapon Envelope Constraint
NASA Technical Reports Server (NTRS)
Menon, P. K. A.
1990-01-01
The optimal pursuit-evasion problem between two aircraft including a realistic weapon envelope is analyzed using differential game theory. Six order nonlinear point mass vehicle models are employed and the inclusion of an arbitrary weapon envelope geometry is allowed. The performance index is a linear combination of flight time and the square of the vehicle acceleration. Closed form solution to this high-order differential game is then obtained using feedback linearization. The solution is in the form of a feedback guidance law together with a quartic polynomial for time-to-go. Due to its modest computational requirements, this nonlinear guidance law is useful for on-board real-time implementation.
ATLAS and LHC computing on CRAY
NASA Astrophysics Data System (ADS)
Sciacca, F. G.; Haug, S.; ATLAS Collaboration
2017-10-01
Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.
High-Throughput Computing on High-Performance Platforms: A Case Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oleynik, D; Panitkin, S; Matteo, Turilli
The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i)more » a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.« less
A real-time spike sorting method based on the embedded GPU.
Zelan Yang; Kedi Xu; Xiang Tian; Shaomin Zhang; Xiaoxiang Zheng
2017-07-01
Microelectrode arrays with hundreds of channels have been widely used to acquire neuron population signals in neuroscience studies. Online spike sorting is becoming one of the most important challenges for high-throughput neural signal acquisition systems. Graphic processing unit (GPU) with high parallel computing capability might provide an alternative solution for increasing real-time computational demands on spike sorting. This study reported a method of real-time spike sorting through computing unified device architecture (CUDA) which was implemented on an embedded GPU (NVIDIA JETSON Tegra K1, TK1). The sorting approach is based on the principal component analysis (PCA) and K-means. By analyzing the parallelism of each process, the method was further optimized in the thread memory model of GPU. Our results showed that the GPU-based classifier on TK1 is 37.92 times faster than the MATLAB-based classifier on PC while their accuracies were the same with each other. The high-performance computing features of embedded GPU demonstrated in our studies suggested that the embedded GPU provide a promising platform for the real-time neural signal processing.
Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus
2016-01-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
NASA Astrophysics Data System (ADS)
Bocharov, A. N.; Bityurin, V. A.; Golovin, N. N.; Evstigneev, N. M.; Petrovskiy, V. P.; Ryabkov, O. I.; Teplyakov, I. O.; Shustov, A. A.; Solomonov, Yu S.; Fortov, V. E.
2016-11-01
In this paper, an approach to solve conjugate heat- and mass-transfer problems is considered to be applied to hypersonic vehicle surface of arbitrary shape. The approach under developing should satisfy the following demands. (i) The surface of the body of interest may have arbitrary geometrical shape. (ii) The shape of the body can change during calculation. (iii) The flight characteristics may vary in a wide range, specifically flight altitude, free-stream Mach number, angle-of-attack, etc. (iv) The approach should be realized with using the high-performance-computing (HPC) technologies. The approach is based on coupled solution of 3D unsteady hypersonic flow equations and 3D unsteady heat conductance problem for the thick wall. Iterative process is applied to account for ablation of wall material and, consequently, mass injection from the surface and changes in the surface shape. While iterations, unstructured computational grids both in the flow region and within the wall interior are adapted to the current geometry and flow conditions. The flow computations are done on HPC platform and are most time-consuming part of the whole problem, while heat conductance problem can be solved on many kinds of computers.
Enhanced intelligent water drops algorithm for multi-depot vehicle routing problem
Akutsah, Francis; Olusanya, Micheal O.; Adewumi, Aderemi O.
2018-01-01
The intelligent water drop algorithm is a swarm-based metaheuristic algorithm, inspired by the characteristics of water drops in the river and the environmental changes resulting from the action of the flowing river. Since its appearance as an alternative stochastic optimization method, the algorithm has found applications in solving a wide range of combinatorial and functional optimization problems. This paper presents an improved intelligent water drop algorithm for solving multi-depot vehicle routing problems. A simulated annealing algorithm was introduced into the proposed algorithm as a local search metaheuristic to prevent the intelligent water drop algorithm from getting trapped into local minima and also improve its solution quality. In addition, some of the potential problematic issues associated with using simulated annealing that include high computational runtime and exponential calculation of the probability of acceptance criteria, are investigated. The exponential calculation of the probability of acceptance criteria for the simulated annealing based techniques is computationally expensive. Therefore, in order to maximize the performance of the intelligent water drop algorithm using simulated annealing, a better way of calculating the probability of acceptance criteria is considered. The performance of the proposed hybrid algorithm is evaluated by using 33 standard test problems, with the results obtained compared with the solutions offered by four well-known techniques from the subject literature. Experimental results and statistical tests show that the new method possesses outstanding performance in terms of solution quality and runtime consumed. In addition, the proposed algorithm is suitable for solving large-scale problems. PMID:29554662
Enhanced intelligent water drops algorithm for multi-depot vehicle routing problem.
Ezugwu, Absalom E; Akutsah, Francis; Olusanya, Micheal O; Adewumi, Aderemi O
2018-01-01
The intelligent water drop algorithm is a swarm-based metaheuristic algorithm, inspired by the characteristics of water drops in the river and the environmental changes resulting from the action of the flowing river. Since its appearance as an alternative stochastic optimization method, the algorithm has found applications in solving a wide range of combinatorial and functional optimization problems. This paper presents an improved intelligent water drop algorithm for solving multi-depot vehicle routing problems. A simulated annealing algorithm was introduced into the proposed algorithm as a local search metaheuristic to prevent the intelligent water drop algorithm from getting trapped into local minima and also improve its solution quality. In addition, some of the potential problematic issues associated with using simulated annealing that include high computational runtime and exponential calculation of the probability of acceptance criteria, are investigated. The exponential calculation of the probability of acceptance criteria for the simulated annealing based techniques is computationally expensive. Therefore, in order to maximize the performance of the intelligent water drop algorithm using simulated annealing, a better way of calculating the probability of acceptance criteria is considered. The performance of the proposed hybrid algorithm is evaluated by using 33 standard test problems, with the results obtained compared with the solutions offered by four well-known techniques from the subject literature. Experimental results and statistical tests show that the new method possesses outstanding performance in terms of solution quality and runtime consumed. In addition, the proposed algorithm is suitable for solving large-scale problems.
An adaptive mesh-moving and refinement procedure for one-dimensional conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Flaherty, Joseph E.; Arney, David C.
1993-01-01
We examine the performance of an adaptive mesh-moving and /or local mesh refinement procedure for the finite difference solution of one-dimensional hyperbolic systems of conservation laws. Adaptive motion of a base mesh is designed to isolate spatially distinct phenomena, and recursive local refinement of the time step and cells of the stationary or moving base mesh is performed in regions where a refinement indicator exceeds a prescribed tolerance. These adaptive procedures are incorporated into a computer code that includes a MacCormack finite difference scheme wih Davis' artificial viscosity model and a discretization error estimate based on Richardson's extrapolation. Experiments are conducted on three problems in order to qualify the advantages of adaptive techniques relative to uniform mesh computations and the relative benefits of mesh moving and refinement. Key results indicate that local mesh refinement, with and without mesh moving, can provide reliable solutions at much lower computational cost than possible on uniform meshes; that mesh motion can be used to improve the results of uniform mesh solutions for a modest computational effort; that the cost of managing the tree data structure associated with refinement is small; and that a combination of mesh motion and refinement reliably produces solutions for the least cost per unit accuracy.
Mota, J.P.B.; Esteves, I.A.A.C.; Rostam-Abadi, M.
2004-01-01
A computational fluid dynamics (CFD) software package has been coupled with the dynamic process simulator of an adsorption storage tank for methane fuelled vehicles. The two solvers run as independent processes and handle non-overlapping portions of the computational domain. The codes exchange data on the boundary interface of the two domains to ensure continuity of the solution and of its gradient. A software interface was developed to dynamically suspend and activate each process as necessary, and be responsible for data exchange and process synchronization. This hybrid computational tool has been successfully employed to accurately simulate the discharge of a new tank design and evaluate its performance. The case study presented here shows that CFD and process simulation are highly complementary computational tools, and that there are clear benefits to be gained from a close integration of the two. ?? 2004 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Veltri, Pierangelo
The use of computer based solutions for data management in biology and clinical science has contributed to improve life-quality and also to gather research results in shorter time. Indeed, new algorithms and high performance computation have been using in proteomics and genomics studies for curing chronic diseases (e.g., drug designing) as well as supporting clinicians both in diagnosis (e.g., images-based diagnosis) and patient curing (e.g., computer based information analysis on information gathered from patient). In this paper we survey on examples of computer based techniques applied in both biology and clinical contexts. The reported applications are also results of experiences in real case applications at University Medical School of Catanzaro and also part of experiences of the National project Staywell SH 2.0 involving many research centers and companies aiming to study and improve citizen wellness.
Development of high-performance low-reflection rugged resistive touch screens for military displays
NASA Astrophysics Data System (ADS)
Wang, Raymond; Wang, Minshine; Thomas, John; Wang, Lawrence; Chang, Victor
2010-04-01
Just as iPhones with sophisticated touch interfaces have revolutionised the human interface for the ubiquitous cell phone, the Military is rapidly adopting touch-screens as a primary interface to their computers and vehicle systems. This paper describes the development of a true military touch interface solution from an existing industrial design. We will report on successful development of 10.4" and 15.4" high performance rugged resistive touch panels using IAD sputter coating. Low reflectance (specular < 1% and diffuse < 0.07%) was achieved with high impact, dust, and chemical resistant surface finishes. These touch panels were qualified over a wide operational temperature range, -51°C to +80°C specifically for military and rugged industrial applications.
Feng, Yen-Yi; Wu, I-Chin; Chen, Tzu-Li
2017-03-01
The number of emergency cases or emergency room visits rapidly increases annually, thus leading to an imbalance in supply and demand and to the long-term overcrowding of hospital emergency departments (EDs). However, current solutions to increase medical resources and improve the handling of patient needs are either impractical or infeasible in the Taiwanese environment. Therefore, EDs must optimize resource allocation given limited medical resources to minimize the average length of stay of patients and medical resource waste costs. This study constructs a multi-objective mathematical model for medical resource allocation in EDs in accordance with emergency flow or procedure. The proposed mathematical model is complex and difficult to solve because its performance value is stochastic; furthermore, the model considers both objectives simultaneously. Thus, this study develops a multi-objective simulation optimization algorithm by integrating a non-dominated sorting genetic algorithm II (NSGA II) with multi-objective computing budget allocation (MOCBA) to address the challenges of multi-objective medical resource allocation. NSGA II is used to investigate plausible solutions for medical resource allocation, and MOCBA identifies effective sets of feasible Pareto (non-dominated) medical resource allocation solutions in addition to effectively allocating simulation or computation budgets. The discrete event simulation model of ED flow is inspired by a Taiwan hospital case and is constructed to estimate the expected performance values of each medical allocation solution as obtained through NSGA II. Finally, computational experiments are performed to verify the effectiveness and performance of the integrated NSGA II and MOCBA method, as well as to derive non-dominated medical resource allocation solutions from the algorithms.
On the long-term stability of terrestrial reference frame solutions based on Kalman filtering
NASA Astrophysics Data System (ADS)
Soja, Benedikt; Gross, Richard S.; Abbondanza, Claudio; Chin, Toshio M.; Heflin, Michael B.; Parker, Jay W.; Wu, Xiaoping; Nilsson, Tobias; Glaser, Susanne; Balidakis, Kyriakos; Heinkelmann, Robert; Schuh, Harald
2018-06-01
The Global Geodetic Observing System requirement for the long-term stability of the International Terrestrial Reference Frame is 0.1 mm/year, motivated by rigorous sea level studies. Furthermore, high-quality station velocities are of great importance for the prediction of future station coordinates, which are fundamental for several geodetic applications. In this study, we investigate the performance of predictions from very long baseline interferometry (VLBI) terrestrial reference frames (TRFs) based on Kalman filtering. The predictions are computed by extrapolating the deterministic part of the coordinate model. As observational data, we used over 4000 VLBI sessions between 1980 and the middle of 2016. In order to study the predictions, we computed VLBI TRF solutions only from the data until the end of 2013. The period of 2014 until 2016.5 was used to validate the predictions of the TRF solutions against the measured VLBI station coordinates. To assess the quality, we computed average WRMS values from the coordinate differences as well as from estimated Helmert transformation parameters, in particular, the scale. We found that the results significantly depend on the level of process noise used in the filter. While larger values of process noise allow the TRF station coordinates to more closely follow the input data (decrease in WRMS of about 45%), the TRF predictions exhibit larger deviations from the VLBI station coordinates after 2014 (WRMS increase of about 15%). On the other hand, lower levels of process noise improve the predictions, making them more similar to those of solutions without process noise. Furthermore, our investigations show that additionally estimating annual signals in the coordinates does not significantly impact the results. Finally, we computed TRF solutions mimicking a potential real-time TRF and found significant improvements over the other investigated solutions, all of which rely on extrapolating the coordinate model for their predictions, with WRMS reductions of almost 50%.
Method and system for benchmarking computers
Gustafson, John L.
1993-09-14
A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Real-time computer treatment of THz passive device images with the high image quality
NASA Astrophysics Data System (ADS)
Trofimov, Vyacheslav A.; Trofimov, Vladislav V.
2012-06-01
We demonstrate real-time computer code improving significantly the quality of images captured by the passive THz imaging system. The code is not only designed for a THz passive device: it can be applied to any kind of such devices and active THz imaging systems as well. We applied our code for computer processing of images captured by four passive THz imaging devices manufactured by different companies. It should be stressed that computer processing of images produced by different companies requires using the different spatial filters usually. The performance of current version of the computer code is greater than one image per second for a THz image having more than 5000 pixels and 24 bit number representation. Processing of THz single image produces about 20 images simultaneously corresponding to various spatial filters. The computer code allows increasing the number of pixels for processed images without noticeable reduction of image quality. The performance of the computer code can be increased many times using parallel algorithms for processing the image. We develop original spatial filters which allow one to see objects with sizes less than 2 cm. The imagery is produced by passive THz imaging devices which captured the images of objects hidden under opaque clothes. For images with high noise we develop an approach which results in suppression of the noise after using the computer processing and we obtain the good quality image. With the aim of illustrating the efficiency of the developed approach we demonstrate the detection of the liquid explosive, ordinary explosive, knife, pistol, metal plate, CD, ceramics, chocolate and other objects hidden under opaque clothes. The results demonstrate the high efficiency of our approach for the detection of hidden objects and they are a very promising solution for the security problem.
Trends in data locality abstractions for HPC systems
Unat, Didem; Dubey, Anshu; Hoefler, Torsten; ...
2017-05-10
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity andmore » performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. Furthermore, this paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.« less
Modal ring method for the scattering of sound
NASA Technical Reports Server (NTRS)
Baumeister, Kenneth J.; Kreider, Kevin L.
1993-01-01
The modal element method for acoustic scattering can be simplified when the scattering body is rigid. In this simplified method, called the modal ring method, the scattering body is represented by a ring of triangular finite elements forming the outer surface. The acoustic pressure is calculated at the element nodes. The pressure in the infinite computational region surrounding the body is represented analytically by an eigenfunction expansion. The two solution forms are coupled by the continuity of pressure and velocity on the body surface. The modal ring method effectively reduces the two-dimensional scattering problem to a one-dimensional problem capable of handling very high frequency scattering. In contrast to the boundary element method or the method of moments, which perform a similar reduction in problem dimension, the model line method has the added advantage of having a highly banded solution matrix requiring considerably less computer storage. The method shows excellent agreement with analytic results for scattering from rigid circular cylinders over a wide frequency range (1 is equal to or less than ka is less than or equal to 100) in the near and far fields.
NASA Astrophysics Data System (ADS)
Hara, Tatsuhiko
2004-08-01
We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems
Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-01-01
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems. PMID:29439442
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems.
Ma, Xingpo; Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-02-10
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data are processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems.
Exploiting opportunistic resources for ATLAS with ARC CE and the Event Service
NASA Astrophysics Data System (ADS)
Cameron, D.; Filipčič, A.; Guan, W.; Tsulaia, V.; Walker, R.; Wenaus, T.;
2017-10-01
With ever-greater computing needs and fixed budgets, big scientific experiments are turning to opportunistic resources as a means to add much-needed extra computing power. These resources can be very different in design from those that comprise the Grid computing of most experiments, therefore exploiting them requires a change in strategy for the experiment. They may be highly restrictive in what can be run or in connections to the outside world, or tolerate opportunistic usage only on condition that tasks may be terminated without warning. The Advanced Resource Connector Computing Element (ARC CE) with its nonintrusive architecture is designed to integrate resources such as High Performance Computing (HPC) systems into a computing Grid. The ATLAS experiment developed the ATLAS Event Service (AES) primarily to address the issue of jobs that can be terminated at any point when opportunistic computing capacity is needed by someone else. This paper describes the integration of these two systems in order to exploit opportunistic resources for ATLAS in a restrictive environment. In addition to the technical details, results from deployment of this solution in the SuperMUC HPC centre in Munich are shown.
High-Performance Secure Database Access Technologies for HEP Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matthew Vranicar; John Weicher
2006-04-17
The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less
Reduced rank regression via adaptive nuclear norm penalization
Chen, Kun; Dong, Hongbo; Chan, Kung-Sik
2014-01-01
Summary We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy. PMID:25045172
NASA Astrophysics Data System (ADS)
Foo, Kam Keong
A two-dimensional dual-mode scramjet flowpath is developed and evaluated using the ANSYS Fluent density-based flow solver with various computational grids. Results are obtained for fuel-off, fuel-on non-reacting, and fuel-on reacting cases at different equivalence ratios. A one-step global chemical kinetics hydrogen-air model is used in conjunction with the eddy-dissipation model. Coarse, medium and fine computational grids are used to evaluate grid sensitivity and to investigate a lack of grid independence. Different grid adaptation strategies are performed on the coarse grid in an attempt to emulate the solutions obtained from the finer grids. The goal of this study is to investigate the feasibility of using various mesh adaptation criteria to significantly decrease computational efforts for high-speed reacting flows.
High-Performance Computing Systems and Operations | Computational Science |
NREL Systems and Operations High-Performance Computing Systems and Operations NREL operates high-performance computing (HPC) systems dedicated to advancing energy efficiency and renewable energy technologies. Capabilities NREL's HPC capabilities include: High-Performance Computing Systems We operate
NASA Technical Reports Server (NTRS)
1985-01-01
Slides are reproduced that describe the importance of having high performance number crunching and graphics capability. They also indicate the types of research and development underway at Ames Research Center to ensure that, in the near term, Ames is a smart buyer and user, and in the long-term that Ames knows the best possible solutions for number crunching and graphics needs. The drivers for this research are real computational physics applications of interest to Ames and NASA. They are concerned with how to map the applications, and how to maximize the physics learned from the results of the calculations. The computer graphics activities are aimed at getting maximum information from the three-dimensional calculations by using the real time manipulation of three-dimensional data on the Silicon Graphics workstation. Work is underway on new algorithms that will permit the display of experimental results that are sparse and random, the same way that the dense and regular computed results are displayed.
Pre-Hardware Optimization of Spacecraft Image Processing Algorithms and Hardware Implementation
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Petrick, David J.; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Day, John H. (Technical Monitor)
2002-01-01
Spacecraft telemetry rates and telemetry product complexity have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image data processing and color picture generation application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The proposed solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms, and reconfigurable computing hardware (RC) technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processors (DSP). It has been shown that this approach can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft.
A controlled genetic algorithm by fuzzy logic and belief functions for job-shop scheduling.
Hajri, S; Liouane, N; Hammadi, S; Borne, P
2000-01-01
Most scheduling problems are highly complex combinatorial problems. However, stochastic methods such as genetic algorithm yield good solutions. In this paper, we present a controlled genetic algorithm (CGA) based on fuzzy logic and belief functions to solve job-shop scheduling problems. For better performance, we propose an efficient representational scheme, heuristic rules for creating the initial population, and a new methodology for mixing and computing genetic operator probabilities.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
Storage media pipelining: Making good use of fine-grained media
NASA Technical Reports Server (NTRS)
Vanmeter, Rodney
1993-01-01
This paper proposes a new high-performance paradigm for accessing removable media such as tapes and especially magneto-optical disks. In high-performance computing the striping of data across multiple devices is a common means of improving data transfer rates. Striping has been used very successfully for fixed magnetic disks improving overall system reliability as well as throughput. It has also been proposed as a solution for providing improved bandwidth for tape and magneto-optical subsystems. However, striping of removable media has shortcomings, particularly in the areas of latency to data and restricted system configurations, and is suitable primarily for very large I/Os. We propose that for fine-grained media, an alternative access method, media pipelining, may be used to provide high bandwidth for large requests while retaining the flexibility to support concurrent small requests and different system configurations. Its principal drawback is high buffering requirements in the host computer or file server. This paper discusses the possible organization of such a system including the hardware conditions under which it may be effective, and the flexibility of configuration. Its expected performance is discussed under varying workloads including large single I/O's and numerous smaller ones. Finally, a specific system incorporating a high-transfer-rate magneto-optical disk drive and autochanger is discussed.
Molina-García, Angel; Campelo, José Carlos; Blanc, Sara; Serrano, Juan José; García-Sánchez, Tania; Bueso, María C.
2015-01-01
This paper proposes and assesses an integrated solution to monitor and diagnose photovoltaic (PV) solar modules based on a decentralized wireless sensor acquisition system. Both DC electrical variables and environmental data are collected at PV module level using low-cost and high-energy efficiency node sensors. Data is real-time processed locally and compared with expected PV module performances obtained by a PV module model based on symmetrized-shifted Gompertz functions (as previously developed and assessed by the authors). Sensor nodes send data to a centralized sink-computing module using a multi-hop wireless sensor network architecture. Such integration thus provides extensive analysis of PV installations, and avoids off-line tests or post-processing processes. In comparison with previous approaches, this solution is enhanced with a low-cost system and non-critical performance constraints, and it is suitable for extensive deployment in PV power plants. Moreover, it is easily implemented in existing PV installations, since no additional wiring is required. The system has been implemented and assessed in a Spanish PV power plant connected to the grid. Results and estimations of PV module performances are also included in the paper. PMID:26230694
Molina-García, Angel; Campelo, José Carlos; Blanc, Sara; Serrano, Juan José; García-Sánchez, Tania; Bueso, María C
2015-07-29
This paper proposes and assesses an integrated solution to monitor and diagnose photovoltaic (PV) solar modules based on a decentralized wireless sensor acquisition system. Both DC electrical variables and environmental data are collected at PV module level using low-cost and high-energy efficiency node sensors. Data is real-time processed locally and compared with expected PV module performances obtained by a PV module model based on symmetrized-shifted Gompertz functions (as previously developed and assessed by the authors). Sensor nodes send data to a centralized sink-computing module using a multi-hop wireless sensor network architecture. Such integration thus provides extensive analysis of PV installations, and avoids off-line tests or post-processing processes. In comparison with previous approaches, this solution is enhanced with a low-cost system and non-critical performance constraints, and it is suitable for extensive deployment in PV power plants. Moreover, it is easily implemented in existing PV installations, since no additional wiring is required. The system has been implemented and assessed in a Spanish PV power plant connected to the grid. Results and estimations of PV module performances are also included in the paper.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models
Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.
2016-01-01
Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
Development of numerical techniques for simulation of magnetogasdynamics and hypersonic chemistry
NASA Astrophysics Data System (ADS)
Damevin, Henri-Marie
Magnetogasdynamics, the science concerned with the mutual interaction between electromagnetic field and flow of electrically conducting gas, offers promising advances in flow control and propulsion of future hypersonic vehicles. Numerical simulations are essential for understanding phenomena, and for research and development. The current dissertation is devoted to the development and validation of numerical algorithms for the solution of multidimensional magnetogasdynamic equations and the simulation of hypersonic high-temperature effects. Governing equations are derived, based on classical magnetogasdynamic assumptions. Two sets of equations are considered, namely the full equations and equations in the low magnetic Reynolds number approximation. Equations are expressed in a suitable formulation for discretization by finite differences in a computational space. For the full equations, Gauss law for magnetism is enforced using Powell's methodology. The time integration method is a four-stage modified Runge-Kutta scheme, amended with a Total Variation Diminishing model in a postprocessing stage. The eigensystem, required for the Total Variation Diminishing scheme, is derived in generalized three-dimensional coordinate system. For the simulation of hypersonic high-temperature effects, two chemical models are utilized, namely a nonequilibrium model and an equilibrium model. A loosely coupled approach is implemented to communicate between the magnetogasdynamic equations and the chemical models. The nonequilibrium model is a one-temperature, five-species, seventeen-reaction model solved by an implicit flux-vector splitting scheme. The chemical equilibrium model computes thermodynamics properties using curve fit procedures. Selected results are provided, which explore the different features of the numerical algorithms. The shock-capturing properties are validated for shock-tube simulations using numerical solutions reported in the literature. The computations of superfast flows over corners and in convergent channels demonstrate the performances of the algorithm in multiple dimensions. The implementation of diffusion terms is validated by solving the magnetic Rayleigh problem and Hartmann problem, for which analytical solutions are available. Prediction of blunt-body type flow are investigated and compared with numerical solutions reported in the literature. The effectiveness of the chemical models for hypersonic flow over blunt body is examined in various flow conditions. It is shown that the proposed schemes perform well in a variety of test cases, though some limitations have been identified.
High resolution flow field prediction for tail rotor aeroacoustics
NASA Technical Reports Server (NTRS)
Quackenbush, Todd R.; Bliss, Donald B.
1989-01-01
The prediction of tail rotor noise due to the impingement of the main rotor wake poses a significant challenge to current analysis methods in rotorcraft aeroacoustics. This paper describes the development of a new treatment of the tail rotor aerodynamic environment that permits highly accurate resolution of the incident flow field with modest computational effort relative to alternative models. The new approach incorporates an advanced full-span free wake model of the main rotor in a scheme which reconstructs high-resolution flow solutions from preliminary, computationally inexpensive simulations with coarse resolution. The heart of the approach is a novel method for using local velocity correction terms to capture the steep velocity gradients characteristic of the vortex-dominated incident flow. Sample calculations have been undertaken to examine the principal types of interactions between the tail rotor and the main rotor wake and to examine the performance of the new method. The results of these sample problems confirm the success of this approach in capturing the high-resolution flows necessary for analysis of rotor-wake/rotor interactions with dramatically reduced computational cost. Computations of radiated sound are also carried out that explore the role of various portions of the main rotor wake in generating tail rotor noise.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kersaudy, Pierric, E-mail: pierric.kersaudy@orange.com; Whist Lab, 38 avenue du Général Leclerc, 92130 Issy-les-Moulineaux; ESYCOM, Université Paris-Est Marne-la-Vallée, 5 boulevard Descartes, 77700 Marne-la-Vallée
2015-04-01
In numerical dosimetry, the recent advances in high performance computing led to a strong reduction of the required computational time to assess the specific absorption rate (SAR) characterizing the human exposure to electromagnetic waves. However, this procedure remains time-consuming and a single simulation can request several hours. As a consequence, the influence of uncertain input parameters on the SAR cannot be analyzed using crude Monte Carlo simulation. The solution presented here to perform such an analysis is surrogate modeling. This paper proposes a novel approach to build such a surrogate model from a design of experiments. Considering a sparse representationmore » of the polynomial chaos expansions using least-angle regression as a selection algorithm to retain the most influential polynomials, this paper proposes to use the selected polynomials as regression functions for the universal Kriging model. The leave-one-out cross validation is used to select the optimal number of polynomials in the deterministic part of the Kriging model. The proposed approach, called LARS-Kriging-PC modeling, is applied to three benchmark examples and then to a full-scale metamodeling problem involving the exposure of a numerical fetus model to a femtocell device. The performances of the LARS-Kriging-PC are compared to an ordinary Kriging model and to a classical sparse polynomial chaos expansion. The LARS-Kriging-PC appears to have better performances than the two other approaches. A significant accuracy improvement is observed compared to the ordinary Kriging or to the sparse polynomial chaos depending on the studied case. This approach seems to be an optimal solution between the two other classical approaches. A global sensitivity analysis is finally performed on the LARS-Kriging-PC model of the fetus exposure problem.« less
Outsourcing Security Services for Low Performance Portable Devices
NASA Astrophysics Data System (ADS)
Szentgyörgyi, Attila; Korn, András
The number of portable devices using wireless network technologies is on the rise. Some of these devices are incapable of, or at a disadvantage at using secure Internet services, because secure communication often requires comparatively high computing capacity. In this paper, we propose a solution which can be used to offer secure network services for low performance portable devices without severely degrading data transmission rates. We also show that using our approach these devices can utilize some secure network services which were so far unavailable to them due to a lack of software support. In order to back up our claims, we present performance measurement results obtained in a test network.
Robotic tape library system level testing at NSA: Present and planned
NASA Technical Reports Server (NTRS)
Shields, Michael F.
1994-01-01
In the present of declining Defense budgets, increased pressure has been placed on the DOD to utilize Commercial Off the Shelf (COTS) solutions to incrementally solve a wide variety of our computer processing requirements. With the rapid growth in processing power, significant expansion of high performance networking, and the increased complexity of applications data sets, the requirement for high performance, large capacity, reliable and secure, and most of all affordable robotic tape storage libraries has greatly increased. Additionally, the migration to a heterogeneous, distributed computing environment has further complicated the problem. With today's open system compute servers approaching yesterday's supercomputer capabilities, the need for affordable, reliable secure Mass Storage Systems (MSS) has taken on an ever increasing importance to our processing center's ability to satisfy operational mission requirements. To that end, NSA has established an in-house capability to acquire, test, and evaluate COTS products. Its goal is to qualify a set of COTS MSS libraries, thereby achieving a modicum of standardization for robotic tape libraries which can satisfy our low, medium, and high performance file and volume serving requirements. In addition, NSA has established relations with other Government Agencies to complete this in-house effort and to maximize our research, testing, and evaluation work. While the preponderance of the effort is focused at the high end of the storage ladder, considerable effort will be extended this year and next at the server class or mid range storage systems.
Post-Optimality Analysis In Aerospace Vehicle Design
NASA Technical Reports Server (NTRS)
Braun, Robert D.; Kroo, Ilan M.; Gage, Peter J.
1993-01-01
This analysis pertains to the applicability of optimal sensitivity information to aerospace vehicle design. An optimal sensitivity (or post-optimality) analysis refers to computations performed once the initial optimization problem is solved. These computations may be used to characterize the design space about the present solution and infer changes in this solution as a result of constraint or parameter variations, without reoptimizing the entire system. The present analysis demonstrates that post-optimality information generated through first-order computations can be used to accurately predict the effect of constraint and parameter perturbations on the optimal solution. This assessment is based on the solution of an aircraft design problem in which the post-optimality estimates are shown to be within a few percent of the true solution over the practical range of constraint and parameter variations. Through solution of a reusable, single-stage-to-orbit, launch vehicle design problem, this optimal sensitivity information is also shown to improve the efficiency of the design process, For a hierarchically decomposed problem, this computational efficiency is realized by estimating the main-problem objective gradient through optimal sep&ivity calculations, By reducing the need for finite differentiation of a re-optimized subproblem, a significant decrease in the number of objective function evaluations required to reach the optimal solution is obtained.
Grid adaption using Chimera composite overlapping meshes
NASA Technical Reports Server (NTRS)
Kao, Kai-Hsiung; Liou, Meng-Sing; Chow, Chuen-Yen
1993-01-01
The objective of this paper is to perform grid adaptation using composite over-lapping meshes in regions of large gradient to capture the salient features accurately during computation. The Chimera grid scheme, a multiple overset mesh technique, is used in combination with a Navier-Stokes solver. The numerical solution is first converged to a steady state based on an initial coarse mesh. Solution-adaptive enhancement is then performed by using a secondary fine grid system which oversets on top of the base grid in the high-gradient region, but without requiring the mesh boundaries to join in any special way. Communications through boundary interfaces between those separated grids are carried out using tri-linear interpolation. Applications to the Euler equations for shock reflections and to a shock wave/boundary layer interaction problem are tested. With the present method, the salient features are well resolved.
Grid adaptation using chimera composite overlapping meshes
NASA Technical Reports Server (NTRS)
Kao, Kai-Hsiung; Liou, Meng-Sing; Chow, Chuen-Yen
1994-01-01
The objective of this paper is to perform grid adaptation using composite overlapping meshes in regions of large gradient to accurately capture the salient features during computation. The chimera grid scheme, a multiple overset mesh technique, is used in combination with a Navier-Stokes solver. The numerical solution is first converged to a steady state based on an initial coarse mesh. Solution-adaptive enhancement is then performed by using a secondary fine grid system which oversets on top of the base grid in the high-gradient region, but without requiring the mesh boundaries to join in any special way. Communications through boundary interfaces between those separated grids are carried out using trilinear interpolation. Application to the Euler equations for shock reflections and to shock wave/boundary layer interaction problem are tested. With the present method, the salient features are well-resolved.
Grid adaptation using Chimera composite overlapping meshes
NASA Technical Reports Server (NTRS)
Kao, Kai-Hsiung; Liou, Meng-Sing; Chow, Chuen-Yen
1993-01-01
The objective of this paper is to perform grid adaptation using composite over-lapping meshes in regions of large gradient to capture the salient features accurately during computation. The Chimera grid scheme, a multiple overset mesh technique, is used in combination with a Navier-Stokes solver. The numerical solution is first converged to a steady state based on an initial coarse mesh. Solution-adaptive enhancement is then performed by using a secondary fine grid system which oversets on top of the base grid in the high-gradient region, but without requiring the mesh boundaries to join in any special way. Communications through boundary interfaces between those separated grids are carried out using tri-linear interpolation. Applications to the Euler equations for shock reflections and to a shock wave/boundary layer interaction problem are tested. With the present method, the salient features are well resolved.
Interface modeling in incompressible media using level sets in Escript
NASA Astrophysics Data System (ADS)
Gross, L.; Bourgouin, L.; Hale, A. J.; Mühlhaus, H.-B.
2007-08-01
We use a finite element (FEM) formulation of the level set method to model geological fluid flow problems involving interface propagation. Interface problems are ubiquitous in geophysics. Here we focus on a Rayleigh-Taylor instability, namely mantel plumes evolution, and the growth of lava domes. Both problems require the accurate description of the propagation of an interface between heavy and light materials (plume) or between high viscous lava and low viscous air (lava dome), respectively. The implementation of the models is based on Escript which is a Python module for the solution of partial differential equations (PDEs) using spatial discretization techniques such as FEM. It is designed to describe numerical models in the language of PDEs while using computational components implemented in C and C++ to achieve high performance for time-intensive, numerical calculations. A critical step in the solution geological flow problems is the solution of the velocity-pressure problem. We describe how the Escript module can be used for a high-level implementation of an efficient variant of the well-known Uzawa scheme. We begin with a brief outline of the Escript modules and then present illustrations of its usage for the numerical solutions of the problems mentioned above.
NASA Astrophysics Data System (ADS)
Blakely, Christopher D.
This dissertation thesis has three main goals: (1) To explore the anatomy of meshless collocation approximation methods that have recently gained attention in the numerical analysis community; (2) Numerically demonstrate why the meshless collocation method should clearly become an attractive alternative to standard finite-element methods due to the simplicity of its implementation and its high-order convergence properties; (3) Propose a meshless collocation method for large scale computational geophysical fluid dynamics models. We provide numerical verification and validation of the meshless collocation scheme applied to the rotational shallow-water equations on the sphere and demonstrate computationally that the proposed model can compete with existing high performance methods for approximating the shallow-water equations such as the SEAM (spectral-element atmospheric model) developed at NCAR. A detailed analysis of the parallel implementation of the model, along with the introduction of parallel algorithmic routines for the high-performance simulation of the model will be given. We analyze the programming and computational aspects of the model using Fortran 90 and the message passing interface (mpi) library along with software and hardware specifications and performance tests. Details from many aspects of the implementation in regards to performance, optimization, and stabilization will be given. In order to verify the mathematical correctness of the algorithms presented and to validate the performance of the meshless collocation shallow-water model, we conclude the thesis with numerical experiments on some standardized test cases for the shallow-water equations on the sphere using the proposed method.
Review of Enabling Technologies to Facilitate Secure Compute Customization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aderholdt, Ferrol; Caldwell, Blake A; Hicks, Susan Elaine
High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data for a variety of users, often requiring strong separation between job allocations. There are many challenges to establishing these secure enclaves within the shared infrastructure of high-performance computing (HPC) environments. The isolation mechanisms in the system software are the basic building blocks for enabling secure compute enclaves. There are a variety of approaches and the focus of this report is to review the different virtualization technologies thatmore » facilitate the creation of secure compute enclaves. The report reviews current operating system (OS) protection mechanisms and modern virtualization technologies to better understand the performance/isolation properties. We also examine the feasibility of running ``virtualized'' computing resources as non-privileged users, and providing controlled administrative permissions for standard users running within a virtualized context. Our examination includes technologies such as Linux containers (LXC [32], Docker [15]) and full virtualization (KVM [26], Xen [5]). We categorize these different approaches to virtualization into two broad groups: OS-level virtualization and system-level virtualization. The OS-level virtualization uses containers to allow a single OS kernel to be partitioned to create Virtual Environments (VE), e.g., LXC. The resources within the host's kernel are only virtualized in the sense of separate namespaces. In contrast, system-level virtualization uses hypervisors to manage multiple OS kernels and virtualize the physical resources (hardware) to create Virtual Machines (VM), e.g., Xen, KVM. This terminology of VE and VM, detailed in Section 2, is used throughout the report to distinguish between the two different approaches to providing virtualized execution environments. As part of our technology review we analyzed several current virtualization solutions to assess their vulnerabilities. This included a review of common vulnerabilities and exposures (CVEs) for Xen, KVM, LXC and Docker to gauge their susceptibility to different attacks. The complete details are provided in Section 5 on page 33. Based on this review we concluded that system-level virtualization solutions have many more vulnerabilities than OS level virtualization solutions. As such, security mechanisms like sVirt (Section 3.3) should be considered when using system-level virtualization solutions in order to protect the host against exploits. The majority of vulnerabilities related to KVM, LXC, and Docker are in specific regions of the system. Therefore, future "zero day attacks" are likely to be in the same regions, which suggests that protecting these areas can simplify the protection of the host and maintain the isolation between users. The evaluations of virtualization technologies done thus far are discussed in Section 4. This includes experiments with 'user' namespaces in VEs, which provides the ability to isolate user privileges and allow a user to run with different UIDs within the container while mapping them to non-privileged UIDs in the host. We have identified Linux namespaces as a promising mechanism to isolate shared resources, while maintaining good performance. In Section 4.1 we describe our tests with LXC as a non-root user and leveraging namespaces to control UID/GID mappings and support controlled sharing of parallel file-systems. We highlight several of these namespace capabilities in Section 6.2.3. The other evaluations that were performed during this initial phase of work provide baseline performance data for comparing VEs and VMs to purely native execution. In Section 4.2 we performed tests using the High-Performance Computing Conjugate Gradient (HPCCG) benchmark to establish baseline performance for a scientific application when run on the Native (host) machine in contrast with execution under Docker and KVM. Our tests verified prior studies showing roughly 2-4% overheads in application execution time & MFlops when running in hypervisor-base environments (VMs) as compared to near native performance with VEs. For more details, see Figures 4.5 (page 28), 4.6 (page 28), and 4.7 (page 29). Additionally, in Section 4.3 we include network measurements for TCP bandwidth performance over the 10GigE interface in our testbed. The Native and Docker based tests achieved >= ~9Gbits/sec, while the KVM configuration only achieved 2.5Gbits/sec (Table 4.6 on page 32). This may be a configuration issue with our KVM installation, and is a point for further testing as we refine the network settings in the testbed. The initial network tests were done using a bridged networking configuration. The report outline is as follows: - Section 1 introduces the report and clarifies the scope of the proj...« less
Breaking the computational barriers of pairwise genome comparison.
Torreno, Oscar; Trelles, Oswaldo
2015-08-11
Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
Development and numerical analysis of low specific speed mixed-flow pump
NASA Astrophysics Data System (ADS)
Li, H. F.; Huo, Y. W.; Pan, Z. B.; Zhou, W. C.; He, M. H.
2012-11-01
With the development of the city, the market of the mixed flow pump with large flux and high head is prospect. The KSB Shanghai Pump Co., LTD decided to develop low speed specific speed mixed flow pump to meet the market requirements. Based on the centrifugal pump and axial flow pump model, aiming at the characteristics of large flux and high head, a new type of guide vane mixed flow pump was designed. The computational fluid dynamics method was adopted to analyze the internal flow of the new type model and predict its performances. The time-averaged Navier-Stokes equations were closed by SST k-ω turbulent model to adapt internal flow of guide vane with larger curvatures. The multi-reference frame(MRF) method was used to deal with the coupling of rotating impeller and static guide vane, and the SIMPLEC method was adopted to achieve the coupling solution of velocity and pressure. The computational results shows that there is great flow impact on the head of vanes at different working conditions, and there is great flow separation at the tailing of the guide vanes at different working conditions, and all will affect the performance of pump. Based on the computational results, optimizations were carried out to decrease the impact on the head of vanes and flow separation at the tailing of the guide vanes. The optimized model was simulated and its performance was predicted. The computational results show that the impact on the head of vanes and the separation at the tailing of the guide vanes disappeared. The high efficiency of the optimized pump is wide, and it fit the original design destination. The newly designed mixed flow pump is now in modeling and its experimental performance will be getting soon.
NASA Astrophysics Data System (ADS)
Xu, Li; Weng, Peifen
2014-02-01
An improved fifth-order weighted essentially non-oscillatory (WENO-Z) scheme combined with the moving overset grid technique has been developed to compute unsteady compressible viscous flows on the helicopter rotor in forward flight. In order to enforce periodic rotation and pitching of the rotor and relative motion between rotor blades, the moving overset grid technique is extended, where a special judgement standard is presented near the odd surface of the blade grid during search donor cells by using the Inverse Map method. The WENO-Z scheme is adopted for reconstructing left and right state values with the Roe Riemann solver updating the inviscid fluxes and compared with the monotone upwind scheme for scalar conservation laws (MUSCL) and the classical WENO scheme. Since the WENO schemes require a six point stencil to build the fifth-order flux, the method of three layers of fringes for hole boundaries and artificial external boundaries is proposed to carry out flow information exchange between chimera grids. The time advance on the unsteady solution is performed by the full implicit dual time stepping method with Newton type LU-SGS subiteration, where the solutions of pseudo steady computation are as the initial fields of the unsteady flow computation. Numerical results on non-variable pitch rotor and periodic variable pitch rotor in forward flight reveal that the approach can effectively capture vortex wake with low dissipation and reach periodic solutions very soon.
Comparison of Nonequilibrium Solution Algorithms Applied to Chemically Stiff Hypersonic Flows
NASA Technical Reports Server (NTRS)
Palmer, Grant; Venkatapathy, Ethiraj
1995-01-01
Three solution algorithms, explicit under-relaxation, point implicit, and lower-upper symmetric Gauss-Seidel, are used to compute nonequilibrium flow around the Apollo 4 return capsule at the 62-km altitude point in its descent trajectory. By varying the Mach number, the efficiency and robustness of the solution algorithms were tested for different levels of chemical stiffness.The performance of the solution algorithms degraded as the Mach number and stiffness of the flow increased. At Mach 15 and 30, the lower-upper symmetric Gauss-Seidel method produces an eight order of magnitude drop in the energy residual in one-third to one-half the Cray C-90 computer time as compared to the point implicit and explicit under-relaxation methods. The explicit under-relaxation algorithm experienced convergence difficulties at Mach 30 and above. At Mach 40 the performance of the lower-upper symmetric Gauss-Seidel algorithm deteriorates to the point that it is out performed by the point implicit method. The effects of the viscous terms are investigated. Grid dependency questions are explored.
Zhang, Hanming; Wang, Linyuan; Yan, Bin; Li, Lei; Cai, Ailong; Hu, Guoen
2016-01-01
Total generalized variation (TGV)-based computed tomography (CT) image reconstruction, which utilizes high-order image derivatives, is superior to total variation-based methods in terms of the preservation of edge information and the suppression of unfavorable staircase effects. However, conventional TGV regularization employs l1-based form, which is not the most direct method for maximizing sparsity prior. In this study, we propose a total generalized p-variation (TGpV) regularization model to improve the sparsity exploitation of TGV and offer efficient solutions to few-view CT image reconstruction problems. To solve the nonconvex optimization problem of the TGpV minimization model, we then present an efficient iterative algorithm based on the alternating minimization of augmented Lagrangian function. All of the resulting subproblems decoupled by variable splitting admit explicit solutions by applying alternating minimization method and generalized p-shrinkage mapping. In addition, approximate solutions that can be easily performed and quickly calculated through fast Fourier transform are derived using the proximal point method to reduce the cost of inner subproblems. The accuracy and efficiency of the simulated and real data are qualitatively and quantitatively evaluated to validate the efficiency and feasibility of the proposed method. Overall, the proposed method exhibits reasonable performance and outperforms the original TGV-based method when applied to few-view problems. PMID:26901410
High-Performance Computing User Facility | Computational Science | NREL
User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
User's guide to the NOZL3D and NOZLIC computer programs
NASA Technical Reports Server (NTRS)
Thomas, P. D.
1980-01-01
Complete FORTRAN listings and running instructions are given for a set of computer programs that perform an implicit numerical solution to the unsteady Navier-Stokes equations to predict the flow characteristics and performance of nonaxisymmetric nozzles. The set includes the NOZL3D program, which performs the flow computations; the NOZLIC program, which sets up the flow field initial conditions for general nozzle configurations, and also generates the computational grid for simple two dimensional and axisymmetric configurations; and the RGRIDD program, which generates the computational grid for complicated three dimensional configurations. The programs are designed specifically for the NASA-Langley CYBER 175 computer, and employ auxiliary disk files for primary data storage. Input instructions and computed results are given for four test cases that include two dimensional, three dimensional, and axisymmetric configurations.
Orientation of doubly rotated quartz plates.
Sherman, J R
1989-01-01
A derivation from classical spherical trigonometry of equations to compute the orientation of doubly-rotated quartz blanks from Bragg X-ray data is discussed. These are usually derived by compact and efficient vector methods, which are reviewed briefly. They are solved by generating a quadratic equation with numerical coefficients. Two methods exist for performing the computation from measurements against two planes: a direct solution by a quadratic equation and a process of convergent iteration. Both have a spurious solution. Measurement against three lattice planes yields a set of three linear equations the solution of which is an unambiguous result.
High Performance Computing Meets Energy Efficiency - Continuum Magazine |
NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data
Theory and computation of optimal low- and medium-thrust transfers
NASA Technical Reports Server (NTRS)
Chuang, C.-H.
1994-01-01
This report presents two numerical methods considered for the computation of fuel-optimal, low-thrust orbit transfers in large numbers of burns. The origins of these methods are observations made with the extremal solutions of transfers in small numbers of burns; there seems to exist a trend such that the longer the time allowed to perform an optimal transfer the less fuel that is used. These longer transfers are obviously of interest since they require a motor of low thrust; however, we also find a trend that the longer the time allowed to perform the optimal transfer the more burns are required to satisfy optimality. Unfortunately, this usually increases the difficulty of computation. Both of the methods described use small-numbered burn solutions to determine solutions in large numbers of burns. One method is a homotopy method that corrects for problems that arise when a solution requires a new burn or coast arc for optimality. The other method is to simply patch together long transfers from smaller ones. An orbit correction problem is solved to develop this method. This method may also lead to a good guidance law for transfer orbits with long transfer times.
Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke
2015-01-01
Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842
Strategies for reducing large fMRI data sets for independent component analysis.
Wang, Ze; Wang, Jiongjiong; Calhoun, Vince; Rao, Hengyi; Detre, John A; Childress, Anna R
2006-06-01
In independent component analysis (ICA), principal component analysis (PCA) is generally used to reduce the raw data to a few principal components (PCs) through eigenvector decomposition (EVD) on the data covariance matrix. Although this works for spatial ICA (sICA) on moderately sized fMRI data, it is intractable for temporal ICA (tICA), since typical fMRI data have a high spatial dimension, resulting in an unmanageable data covariance matrix. To solve this problem, two practical data reduction methods are presented in this paper. The first solution is to calculate the PCs of tICA from the PCs of sICA. This approach works well for moderately sized fMRI data; however, it is highly computationally intensive, even intractable, when the number of scans increases. The second solution proposed is to perform PCA decomposition via a cascade recursive least squared (CRLS) network, which provides a uniform data reduction solution for both sICA and tICA. Without the need to calculate the covariance matrix, CRLS extracts PCs directly from the raw data, and the PC extraction can be terminated after computing an arbitrary number of PCs without the need to estimate the whole set of PCs. Moreover, when the whole data set becomes too large to be loaded into the machine memory, CRLS-PCA can save data retrieval time by reading the data once, while the conventional PCA requires numerous data retrieval steps for both covariance matrix calculation and PC extractions. Real fMRI data were used to evaluate the PC extraction precision, computational expense, and memory usage of the presented methods.
ATLAS computing on Swiss Cloud SWITCHengines
NASA Astrophysics Data System (ADS)
Haug, S.; Sciacca, F. G.; ATLAS Collaboration
2017-10-01
Consolidation towards more computing at flat budgets beyond what pure chip technology can offer, is a requirement for the full scientific exploitation of the future data from the Large Hadron Collider at CERN in Geneva. One consolidation measure is to exploit cloud infrastructures whenever they are financially competitive. We report on the technical solutions and the performances used and achieved running simulation tasks for the ATLAS experiment on SWITCHengines. SWITCHengines is a new infrastructure as a service offered to Swiss academia by the National Research and Education Network SWITCH. While solutions and performances are general, financial considerations and policies, on which we also report, are country specific.
Computational structural mechanics for engine structures
NASA Technical Reports Server (NTRS)
Chamis, Christos C.
1988-01-01
The computational structural mechanics (CSM) program at Lewis encompasses the formulation and solution of structural mechanics problems and the development of integrated software systems to computationally simulate the performance, durability, and life of engine structures. It is structured to supplement, complement, and, whenever possible, replace costly experimental efforts. Specific objectives are to investigate unique advantages of parallel and multiprocessing for reformulating and solving structural mechanics and formulating and solving multidisciplinary mechanics and to develop integrated structural system computational simulators for predicting structural performance, evaluating newly developed methods, and identifying and prioritizing improved or missing methods.
Computational structural mechanics for engine structures
NASA Technical Reports Server (NTRS)
Chamis, Christos C.
1989-01-01
The computational structural mechanics (CSM) program at Lewis encompasses the formulation and solution of structural mechanics problems and the development of integrated software systems to computationally simulate the performance, durability, and life of engine structures. It is structured to supplement, complement, and, whenever possible, replace costly experimental efforts. Specific objectives are to investigate unique advantages of parallel and multiprocessing for reformulating and solving structural mechanics and formulating and solving multidisciplinary mechanics and to develop integrated structural system computational simulators for predicting structural performance, evaluating newly developed methods, and identifying and prioritizing improved or missing methods.
AI tools in computer based problem solving
NASA Technical Reports Server (NTRS)
Beane, Arthur J.
1988-01-01
The use of computers to solve value oriented, deterministic, algorithmic problems, has evolved a structured life cycle model of the software process. The symbolic processing techniques used, primarily in research, for solving nondeterministic problems, and those for which an algorithmic solution is unknown, have evolved a different model, much less structured. Traditionally, the two approaches have been used completely independently. With the advent of low cost, high performance 32 bit workstations executing identical software with large minicomputers and mainframes, it became possible to begin to merge both models into a single extended model of computer problem solving. The implementation of such an extended model on a VAX family of micro/mini/mainframe systems is described. Examples in both development and deployment of applications involving a blending of AI and traditional techniques are given.
NASA Astrophysics Data System (ADS)
Holden, Jacob R.
Descending maple seeds generate lift to slow their fall and remain aloft in a blowing wind; have the wings of these seeds evolved to descend as slowly as possible? A unique energy balance equation, experimental data, and computational fluid dynamics simulations have all been developed to explore this question from a turbomachinery perspective. The computational fluid dynamics in this work is the first to be performed in the relative reference frame. Maple seed performance has been analyzed for the first time based on principles of wind turbine analysis. Application of the Betz Limit and one-dimensional momentum theory allowed for empirical and computational power and thrust coefficients to be computed for maple seeds. It has been determined that the investigated species of maple seeds perform near the Betz limit for power conversion and thrust coefficient. The power coefficient for a maple seed is found to be in the range of 48-54% and the thrust coefficient in the range of 66-84%. From Betz theory, the stream tube area expansion of the maple seed is necessary for power extraction. Further investigation of computational solutions and mechanical analysis find three key reasons for high maple seed performance. First, the area expansion is driven by maple seed lift generation changing the fluid momentum and requiring area to increase. Second, radial flow along the seed surface is promoted by a sustained leading edge vortex that centrifuges low momentum fluid outward. Finally, the area expansion is also driven by the spanwise area variation of the maple seed imparting a radial force on the flow. These mechanisms result in a highly effective device for the purpose of seed dispersal. However, the maple seed also provides insight into fundamental questions about how turbines can most effectively change the momentum of moving fluids in order to extract useful power or dissipate kinetic energy.
Reference Solutions for Benchmark Turbulent Flows in Three Dimensions
NASA Technical Reports Server (NTRS)
Diskin, Boris; Thomas, James L.; Pandya, Mohagna J.; Rumsey, Christopher L.
2016-01-01
A grid convergence study is performed to establish benchmark solutions for turbulent flows in three dimensions (3D) in support of turbulence-model verification campaign at the Turbulence Modeling Resource (TMR) website. The three benchmark cases are subsonic flows around a 3D bump and a hemisphere-cylinder configuration and a supersonic internal flow through a square duct. Reference solutions are computed for Reynolds Averaged Navier Stokes equations with the Spalart-Allmaras turbulence model using a linear eddy-viscosity model for the external flows and a nonlinear eddy-viscosity model based on a quadratic constitutive relation for the internal flow. The study involves three widely-used practical computational fluid dynamics codes developed and supported at NASA Langley Research Center: FUN3D, USM3D, and CFL3D. Reference steady-state solutions computed with these three codes on families of consistently refined grids are presented. Grid-to-grid and code-to-code variations are described in detail.
Robot Geometry and the High School Curriculum.
ERIC Educational Resources Information Center
Meyer, Walter
1988-01-01
Description of the field of robotics and its possible use in high school computational geometry classes emphasizes motion planning exercises and computer graphics displays. Eleven geometrical problems based on robotics are presented along with the correct solutions and explanations. (LRW)
Enterprise Cloud Architecture for Chinese Ministry of Railway
NASA Astrophysics Data System (ADS)
Shan, Xumei; Liu, Hefeng
Enterprise like PRC Ministry of Railways (MOR), is facing various challenges ranging from highly distributed computing environment and low legacy system utilization, Cloud Computing is increasingly regarded as one workable solution to address this. This article describes full scale cloud solution with Intel Tashi as virtual machine infrastructure layer, Hadoop HDFS as computing platform, and self developed SaaS interface, gluing virtual machine and HDFS with Xen hypervisor. As a result, on demand computing task application and deployment have been tackled per MOR real working scenarios at the end of article.
Low-complexity R-peak detection for ambulatory fetal monitoring.
Rooijakkers, Michael J; Rabotti, Chiara; Oei, S Guid; Mischi, Massimo
2012-07-01
Non-invasive fetal health monitoring during pregnancy is becoming increasingly important because of the increasing number of high-risk pregnancies. Despite recent advances in signal-processing technology, which have enabled fetal monitoring during pregnancy using abdominal electrocardiogram (ECG) recordings, ubiquitous fetal health monitoring is still unfeasible due to the computational complexity of noise-robust solutions. In this paper, an ECG R-peak detection algorithm for ambulatory R-peak detection is proposed, as part of a fetal ECG detection algorithm. The proposed algorithm is optimized to reduce computational complexity, without reducing the R-peak detection performance compared to the existing R-peak detection schemes. Validation of the algorithm is performed on three manually annotated datasets. With a detection error rate of 0.23%, 1.32% and 9.42% on the MIT/BIH Arrhythmia and in-house maternal and fetal databases, respectively, the detection rate of the proposed algorithm is comparable to the best state-of-the-art algorithms, at a reduced computational complexity.
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Floating-Point Modules Targeted for Use with RC Compilation Tools
NASA Technical Reports Server (NTRS)
Sahin, Ibrahin; Gloster, Clay S.
2000-01-01
Reconfigurable Computing (RC) has emerged as a viable computing solution for computationally intensive applications. Several applications have been mapped to RC system and in most cases, they provided the smallest published execution time. Although RC systems offer significant performance advantages over general-purpose processors, they require more application development time than general-purpose processors. This increased development time of RC systems provides the motivation to develop an optimized module library with an assembly language instruction format interface for use with future RC system that will reduce development time significantly. In this paper, we present area/performance metrics for several different types of floating point (FP) modules that can be utilized to develop complex FP applications. These modules are highly pipelined and optimized for both speed and area. Using these modules, and example application, FP matrix multiplication, is also presented. Our results and experiences show, that with these modules, 8-10X speedup over general-purpose processors can be achieved.
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
InPRO: Automated Indoor Construction Progress Monitoring Using Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Hamledari, Hesam
In this research, an envisioned automated intelligent robotic solution for automated indoor data collection and inspection that employs a series of unmanned aerial vehicles (UAV), entitled "InPRO", is presented. InPRO consists of four stages, namely: 1) automated path planning; 2) autonomous UAV-based indoor inspection; 3) automated computer vision-based assessment of progress; and, 4) automated updating of 4D building information models (BIM). The works presented in this thesis address the third stage of InPRO. A series of computer vision-based methods that automate the assessment of construction progress using images captured at indoor sites are introduced. The proposed methods employ computer vision and machine learning techniques to detect the components of under-construction indoor partitions. In particular, framing (studs), insulation, electrical outlets, and different states of drywall sheets (installing, plastering, and painting) are automatically detected using digital images. High accuracy rates, real-time performance, and operation without a priori information are indicators of the methods' promising performance.
Parallel solution of sparse one-dimensional dynamic programming problems
NASA Technical Reports Server (NTRS)
Nicol, David M.
1989-01-01
Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Strbac, V; Pierce, D M; Vander Sloten, J; Famaey, N
2017-12-01
Finite element (FE) simulations are increasingly valuable in assessing and improving the performance of biomedical devices and procedures. Due to high computational demands such simulations may become difficult or even infeasible, especially when considering nearly incompressible and anisotropic material models prevalent in analyses of soft tissues. Implementations of GPGPU-based explicit FEs predominantly cover isotropic materials, e.g. the neo-Hookean model. To elucidate the computational expense of anisotropic materials, we implement the Gasser-Ogden-Holzapfel dispersed, fiber-reinforced model and compare solution times against the neo-Hookean model. Implementations of GPGPU-based explicit FEs conventionally rely on single-point (under) integration. To elucidate the expense of full and selective-reduced integration (more reliable) we implement both and compare corresponding solution times against those generated using underintegration. To better understand the advancement of hardware, we compare results generated using representative Nvidia GPGPUs from three recent generations: Fermi (C2075), Kepler (K20c), and Maxwell (GTX980). We explore scaling by solving the same boundary value problem (an extension-inflation test on a segment of human aorta) with progressively larger FE meshes. Our results demonstrate substantial improvements in simulation speeds relative to two benchmark FE codes (up to 300[Formula: see text] while maintaining accuracy), and thus open many avenues to novel applications in biomechanics and medicine.
Closed-form solution of decomposable stochastic models
NASA Technical Reports Server (NTRS)
Sjogren, Jon A.
1990-01-01
Markov and semi-Markov processes are increasingly being used in the modeling of complex reconfigurable systems (fault tolerant computers). The estimation of the reliability (or some measure of performance) of the system reduces to solving the process for its state probabilities. Such a model may exhibit numerous states and complicated transition distributions, contributing to an expensive and numerically delicate solution procedure. Thus, when a system exhibits a decomposition property, either structurally (autonomous subsystems), or behaviorally (component failure versus reconfiguration), it is desirable to exploit this decomposition in the reliability calculation. In interesting cases there can be failure states which arise from non-failure states of the subsystems. Equations are presented which allow the computation of failure probabilities of the total (combined) model without requiring a complete solution of the combined model. This material is presented within the context of closed-form functional representation of probabilities as utilized in the Symbolic Hierarchical Automated Reliability and Performance Evaluator (SHARPE) tool. The techniques adopted enable one to compute such probability functions for a much wider class of systems at a reduced computational cost. Several examples show how the method is used, especially in enhancing the versatility of the SHARPE tool.
A Unified View of Global Instability of Compressible Flow over Open Cavities
2006-03-28
in terms of number of steps realized by the DNS code per second (S/sec) as the number of processors ( np ) increases. For this comparison the “new...computations). It may clearly be seen that both solutions performed comparably well at low number of processors; however, as np increased, the Myrinet...has subsequently been designed, hard -coded and validated at nu modelling. Design characteristics of the code have been a) high-accuracy, b
NASA Astrophysics Data System (ADS)
Drabik, Timothy J.; Lee, Sing H.
1986-11-01
The intrinsic parallelism characteristics of easily realizable optical SIMD arrays prompt their present consideration in the implementation of highly structured algorithms for the numerical solution of multidimensional partial differential equations and the computation of fast numerical transforms. Attention is given to a system, comprising several spatial light modulators (SLMs), an optical read/write memory, and a functional block, which performs simple, space-invariant shifts on images with sufficient flexibility to implement the fastest known methods for partial differential equations as well as a wide variety of numerical transforms in two or more dimensions. Either fixed or floating-point arithmetic may be used. A performance projection of more than 1 billion floating point operations/sec using SLMs with 1000 x 1000-resolution and operating at 1-MHz frame rates is made.
Grand challenges in mass storage: A systems integrators perspective
NASA Technical Reports Server (NTRS)
Lee, Richard R.; Mintz, Daniel G.
1993-01-01
Within today's much ballyhooed supercomputing environment, with its CFLOPS of CPU power, and Gigabit networks, there exists a major roadblock to computing success; that of Mass Storage. The solution to this mass storage problem is considered to be one of the 'Grand Challenges' facing the computer industry today, as well as long into the future. It has become obvious to us, as well as many others in the industry, that there is no clear single solution in sight. The Systems Integrator today is faced with a myriad of quandaries in approaching this challenge. He must first be innovative in approach, second choose hardware solutions that are volumetric efficient; high in signal bandwidth; available from multiple sources; competitively priced, and have forward growth extendibility. In addition he must also comply with a variety of mandated, and often conflicting software standards (GOSIP, POSIX, IEEE, MSRM 4.0, and others), and finally he must deliver a systems solution with the 'most bang for the buck' in terms of cost vs. performance factors. These quandaries challenge the Systems Integrator to 'push the envelope' in terms of his or her ingenuity and innovation on an almost daily basis. This dynamic is explored further, and an attempt to acquaint the audience with rational approaches to this 'Grand Challenge' is made.
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.
Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P
2010-01-15
A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
Computing the Feasible Spaces of Optimal Power Flow Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Molzahn, Daniel K.
The solution to an optimal power flow (OPF) problem provides a minimum cost operating point for an electric power system. The performance of OPF solution techniques strongly depends on the problem’s feasible space. This paper presents an algorithm that is guaranteed to compute the entire feasible spaces of small OPF problems to within a specified discretization tolerance. Specifically, the feasible space is computed by discretizing certain of the OPF problem’s inequality constraints to obtain a set of power flow equations. All solutions to the power flow equations at each discretization point are obtained using the Numerical Polynomial Homotopy Continuation (NPHC)more » algorithm. To improve computational tractability, “bound tightening” and “grid pruning” algorithms use convex relaxations to preclude consideration of many discretization points that are infeasible for the OPF problem. Here, the proposed algorithm is used to generate the feasible spaces of two small test cases.« less
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Computing the Feasible Spaces of Optimal Power Flow Problems
Molzahn, Daniel K.
2017-03-15
The solution to an optimal power flow (OPF) problem provides a minimum cost operating point for an electric power system. The performance of OPF solution techniques strongly depends on the problem’s feasible space. This paper presents an algorithm that is guaranteed to compute the entire feasible spaces of small OPF problems to within a specified discretization tolerance. Specifically, the feasible space is computed by discretizing certain of the OPF problem’s inequality constraints to obtain a set of power flow equations. All solutions to the power flow equations at each discretization point are obtained using the Numerical Polynomial Homotopy Continuation (NPHC)more » algorithm. To improve computational tractability, “bound tightening” and “grid pruning” algorithms use convex relaxations to preclude consideration of many discretization points that are infeasible for the OPF problem. Here, the proposed algorithm is used to generate the feasible spaces of two small test cases.« less
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha
2005-01-01
This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.
Uncertain behaviours of integrated circuits improve computational performance.
Yoshimura, Chihiro; Yamaoka, Masanao; Hayashi, Masato; Okuyama, Takuya; Aoki, Hidetaka; Kawarabayashi, Ken-ichi; Mizuno, Hiroyuki
2015-11-20
Improvements to the performance of conventional computers have mainly been achieved through semiconductor scaling; however, scaling is reaching its limitations. Natural phenomena, such as quantum superposition and stochastic resonance, have been introduced into new computing paradigms to improve performance beyond these limitations. Here, we explain that the uncertain behaviours of devices due to semiconductor scaling can improve the performance of computers. We prototyped an integrated circuit by performing a ground-state search of the Ising model. The bit errors of memory cell devices holding the current state of search occur probabilistically by inserting fluctuations into dynamic device characteristics, which will be actualised in the future to the chip. As a result, we observed more improvements in solution accuracy than that without fluctuations. Although the uncertain behaviours of devices had been intended to be eliminated in conventional devices, we demonstrate that uncertain behaviours has become the key to improving computational performance.
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H. Fatih; Goren, Sezer
2016-01-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed. PMID:27733925
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H Fatih; Goren, Sezer; Aydin, Nizamettin
2016-09-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hussain, Hameed; Malik, Saif Ur Rehman; Hameed, Abdul
An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement ofmore » all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature.« less
Numerical Simulations For the F-16XL Aircraft Configuration
NASA Technical Reports Server (NTRS)
Elmiligui, Alaa A.; Abdol-Hamid, Khaled; Cavallo, Peter A.; Parlette, Edward B.
2014-01-01
Numerical simulations of flow around the F-16XL are presented as a contribution to the Cranked Arrow Wing Aerodynamic Project International II (CAWAPI-II). The NASA Tetrahedral Unstructured Software System (TetrUSS) is used to perform numerical simulations. This CFD suite, developed and maintained by NASA Langley Research Center, includes an unstructured grid generation program called VGRID, a postprocessor named POSTGRID, and the flow solver USM3D. The CRISP CFD package is utilized to provide error estimates and grid adaption for verification of USM3D results. A subsonic high angle-of-attack case flight condition (FC) 25 is computed and analyzed. Three turbulence models are used in the calculations: the one-equation Spalart-Allmaras (SA), the two-equation shear stress transport (SST) and the ke turbulence models. Computational results, and surface static pressure profiles are presented and compared with flight data. Solution verification is performed using formal grid refinement studies, the solution of Error Transport Equations, and adaptive mesh refinement. The current study shows that the USM3D solver coupled with CRISP CFD can be used in an engineering environment in predicting vortex-flow physics on a complex configuration at flight Reynolds numbers.
NASA Technical Reports Server (NTRS)
Alexandrov, N. M.; Nielsen, E. J.; Lewis, R. M.; Anderson, W. K.
2000-01-01
First-order approximation and model management is a methodology for a systematic use of variable-fidelity models or approximations in optimization. The intent of model management is to attain convergence to high-fidelity solutions with minimal expense in high-fidelity computations. The savings in terms of computationally intensive evaluations depends on the ability of the available lower-fidelity model or a suite of models to predict the improvement trends for the high-fidelity problem, Variable-fidelity models can be represented by data-fitting approximations, variable-resolution models. variable-convergence models. or variable physical fidelity models. The present work considers the use of variable-fidelity physics models. We demonstrate the performance of model management on an aerodynamic optimization of a multi-element airfoil designed to operate in the transonic regime. Reynolds-averaged Navier-Stokes equations represent the high-fidelity model, while the Euler equations represent the low-fidelity model. An unstructured mesh-based analysis code FUN2D evaluates functions and sensitivity derivatives for both models. Model management for the present demonstration problem yields fivefold savings in terms of high-fidelity evaluations compared to optimization done with high-fidelity computations alone.
On the reliability of computed chaotic solutions of non-linear differential equations
NASA Astrophysics Data System (ADS)
Liao, Shijun
2009-08-01
A new concept, namely the critical predictable time Tc, is introduced to give a more precise description of computed chaotic solutions of non-linear differential equations: it is suggested that computed chaotic solutions are unreliable and doubtable when t > Tc. This provides us a strategy to detect reliable solution from a given computed result. In this way, the computational phenomena, such as computational chaos (CC), computational periodicity (CP) and computational prediction uncertainty, which are mainly based on long-term properties of computed time-series, can be completely avoided. Using this concept, the famous conclusion `accurate long-term prediction of chaos is impossible' should be replaced by a more precise conclusion that `accurate prediction of chaos beyond the critical predictable time Tc is impossible'. So, this concept also provides us a timescale to determine whether or not a particular time is long enough for a given non-linear dynamic system. Besides, the influence of data inaccuracy and various numerical schemes on the critical predictable time is investigated in details by using symbolic computation software as a tool. A reliable chaotic solution of Lorenz equation in a rather large interval 0 <= t < 1200 non-dimensional Lorenz time units is obtained for the first time. It is found that the precision of the initial condition and the computed data at each time step, which is mathematically necessary to get such a reliable chaotic solution in such a long time, is so high that it is physically impossible due to the Heisenberg uncertainty principle in quantum physics. This, however, provides us a so-called `precision paradox of chaos', which suggests that the prediction uncertainty of chaos is physically unavoidable, and that even the macroscopical phenomena might be essentially stochastic and thus could be described by probability more economically.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
1994-01-01
Strong interactions can occur between the flow about an aerospace vehicle and its structural components resulting in several important aeroelastic phenomena. These aeroelastic phenomena can significantly influence the performance of the vehicle. At present, closed-form solutions are available for aeroelastic computations when flows are in either the linear subsonic or supersonic range. However, for aeroelasticity involving complex nonlinear flows with shock waves, vortices, flow separations, and aerodynamic heating, computational methods are still under development. These complex aeroelastic interactions can be dangerous and limit the performance of aircraft. Examples of these detrimental effects are aircraft with highly swept wings experiencing vortex-induced aeroelastic oscillations, transonic regime at which the flutter speed is low, aerothermoelastic loads that play a critical role in the design of high-speed vehicles, and flow separations that often lead to buffeting with undesirable structural oscillations. The simulation of these complex aeroelastic phenomena requires an integrated analysis of fluids and structures. This report presents a summary of the development, applications, and procedures to use the multidisciplinary computer code ENSAERO. This code is based on the Euler/Navier-Stokes flow equations and modal/finite-element structural equations.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
High performance computing and communications program
NASA Technical Reports Server (NTRS)
Holcomb, Lee
1992-01-01
A review of the High Performance Computing and Communications (HPCC) program is provided in vugraph format. The goals and objectives of this federal program are as follows: extend U.S. leadership in high performance computing and computer communications; disseminate the technologies to speed innovation and to serve national goals; and spur gains in industrial competitiveness by making high performance computing integral to design and production.
MHOST: An efficient finite element program for inelastic analysis of solids and structures
NASA Technical Reports Server (NTRS)
Nakazawa, S.
1988-01-01
An efficient finite element program for 3-D inelastic analysis of gas turbine hot section components was constructed and validated. A novel mixed iterative solution strategy is derived from the augmented Hu-Washizu variational principle in order to nodally interpolate coordinates, displacements, deformation, strains, stresses and material properties. A series of increasingly sophisticated material models incorporated in MHOST include elasticity, secant plasticity, infinitesimal and finite deformation plasticity, creep and unified viscoplastic constitutive model proposed by Walker. A library of high performance elements is built into this computer program utilizing the concepts of selective reduced integrations and independent strain interpolations. A family of efficient solution algorithms is implemented in MHOST for linear and nonlinear equation solution including the classical Newton-Raphson, modified, quasi and secant Newton methods with optional line search and the conjugate gradient method.
NASA Technical Reports Server (NTRS)
Dobrinskaya, Tatiana
2015-01-01
This paper suggests a new method for optimizing yaw maneuvers on the International Space Station (ISS). Yaw rotations are the most common large maneuvers on the ISS often used for docking and undocking operations, as well as for other activities. When maneuver optimization is used, large maneuvers, which were performed on thrusters, could be performed either using control moment gyroscopes (CMG), or with significantly reduced thruster firings. Maneuver optimization helps to save expensive propellant and reduce structural loads - an important factor for the ISS service life. In addition, optimized maneuvers reduce contamination of the critical elements of the vehicle structure, such as solar arrays. This paper presents an analytical solution for optimizing yaw attitude maneuvers. Equations describing pitch and roll motion needed to counteract the major torques during a yaw maneuver are obtained. A yaw rate profile is proposed. Also the paper describes the physical basis of the suggested optimization approach. In the obtained optimized case, the torques are significantly reduced. This torque reduction was compared to the existing optimization method which utilizes the computational solution. It was shown that the attitude profiles and the torque reduction have a good match for these two methods of optimization. The simulations using the ISS flight software showed similar propellant consumption for both methods. The analytical solution proposed in this paper has major benefits with respect to computational approach. In contrast to the current computational solution, which only can be calculated on the ground, the analytical solution does not require extensive computational resources, and can be implemented in the onboard software, thus, making the maneuver execution automatic. The automatic maneuver significantly simplifies the operations and, if necessary, allows to perform a maneuver without communication with the ground. It also reduces the probability of command errors. The suggested analytical solution provides a new method of maneuver optimization which is less complicated, automatic and more universal. A maneuver optimization approach, presented in this paper, can be used not only for the ISS, but for other orbiting space vehicles.
NASA Astrophysics Data System (ADS)
Sourbier, F.; Operto, S.; Virieux, J.
2006-12-01
We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.
NASA Astrophysics Data System (ADS)
Heilmann, B. Z.; Vallenilla Ferrara, A. M.
2009-04-01
The constant growth of contaminated sites, the unsustainable use of natural resources, and, last but not least, the hydrological risk related to extreme meteorological events and increased climate variability are major environmental issues of today. Finding solutions for these complex problems requires an integrated cross-disciplinary approach, providing a unified basis for environmental science and engineering. In computer science, grid computing is emerging worldwide as a formidable tool allowing distributed computation and data management with administratively-distant resources. Utilizing these modern High Performance Computing (HPC) technologies, the GRIDA3 project bundles several applications from different fields of geoscience aiming to support decision making for reasonable and responsible land use and resource management. In this abstract we present a geophysical application called EIAGRID that uses grid computing facilities to perform real-time subsurface imaging by on-the-fly processing of seismic field data and fast optimization of the processing workflow. Even though, seismic reflection profiling has a broad application range spanning from shallow targets in a few meters depth to targets in a depth of several kilometers, it is primarily used by the hydrocarbon industry and hardly for environmental purposes. The complexity of data acquisition and processing poses severe problems for environmental and geotechnical engineering: Professional seismic processing software is expensive to buy and demands large experience from the user. In-field processing equipment needed for real-time data Quality Control (QC) and immediate optimization of the acquisition parameters is often not available for this kind of studies. As a result, the data quality will be suboptimal. In the worst case, a crucial parameter such as receiver spacing, maximum offset, or recording time turns out later to be inappropriate and the complete acquisition campaign has to be repeated. The EIAGRID portal provides an innovative solution to this problem combining state-of-the-art data processing methods and modern remote grid computing technology. In field-processing equipment is substituted by remote access to high performance grid computing facilities. The latter can be ubiquitously controlled by a user-friendly web-browser interface accessed from the field by any mobile computer using wireless data transmission technology such as UMTS (Universal Mobile Telecommunications System) or HSUPA/HSDPA (High-Speed Uplink/Downlink Packet Access). The complexity of data-manipulation and processing and thus also the time demanding user interaction is minimized by a data-driven, and highly automated velocity analysis and imaging approach based on the Common-Reflection-Surface (CRS) stack. Furthermore, the huge computing power provided by the grid deployment allows parallel testing of alternative processing sequences and parameter settings, a feature which considerably reduces the turn-around times. A shared data storage using georeferencing tools and data grid technology is under current development. It will allow to publish already accomplished projects, making results, processing workflows and parameter settings available in a transparent and reproducible way. Creating a unified database shared by all users will facilitate complex studies and enable the use of data-crossing techniques to incorporate results of other environmental applications hosted on the GRIDA3 portal.
Multiple-grid convergence acceleration of viscous and inviscid flow computations
NASA Technical Reports Server (NTRS)
Johnson, G. M.
1983-01-01
A multiple-grid algorithm for use in efficiently obtaining steady solution to the Euler and Navier-Stokes equations is presented. The convergence of a simple, explicit fine-grid solution procedure is accelerated on a sequence of successively coarser grids by a coarse-grid information propagation method which rapidly eliminates transients from the computational domain. This use of multiple-gridding to increase the convergence rate results in substantially reduced work requirements for the numerical solution of a wide range of flow problems. Computational results are presented for subsonic and transonic inviscid flows and for laminar and turbulent, attached and separated, subsonic viscous flows. Work reduction factors as large as eight, in comparison to the basic fine-grid algorithm, were obtained. Possibilities for further performance improvement are discussed.
A Stochastic Total Least Squares Solution of Adaptive Filtering Problem
Ahmad, Noor Atinah
2014-01-01
An efficient and computationally linear algorithm is derived for total least squares solution of adaptive filtering problem, when both input and output signals are contaminated by noise. The proposed total least mean squares (TLMS) algorithm is designed by recursively computing an optimal solution of adaptive TLS problem by minimizing instantaneous value of weighted cost function. Convergence analysis of the algorithm is given to show the global convergence of the proposed algorithm, provided that the stepsize parameter is appropriately chosen. The TLMS algorithm is computationally simpler than the other TLS algorithms and demonstrates a better performance as compared with the least mean square (LMS) and normalized least mean square (NLMS) algorithms. It provides minimum mean square deviation by exhibiting better convergence in misalignment for unknown system identification under noisy inputs. PMID:24688412
Symbolic Computational Approach to the Marangoni Convection Problem With Soret Diffusion
NASA Technical Reports Server (NTRS)
Skarda, J. Raymond
1998-01-01
A recently reported solution for stationary stability of a thermosolutal system with Soret diffusion is re-derived and examined using a symbolic computational package. Symbolic computational languages are well suited for such an analysis and facilitate a pragmatic approach that is adaptable to similar problems. Linearization of the equations, normal mode analysis, and extraction of the final solution are performed in a Mathematica notebook format. An exact solution is obtained for stationary stability in the limit of zero gravity. A closed form expression is also obtained for the location of asymptotes in relevant parameter, (Sm(sub c), Mac(sub c)), space. The stationary stability behavior is conveniently examined within the symbolic language environment. An abbreviated version of the Mathematica notebook is given in the Appendix.
NASA Astrophysics Data System (ADS)
Nigro, A.; De Bartolo, C.; Crivellini, A.; Bassi, F.
2017-12-01
In this paper we investigate the possibility of using the high-order accurate A (α) -stable Second Derivative (SD) schemes proposed by Enright for the implicit time integration of the Discontinuous Galerkin (DG) space-discretized Navier-Stokes equations. These multistep schemes are A-stable up to fourth-order, but their use results in a system matrix difficult to compute. Furthermore, the evaluation of the nonlinear function is computationally very demanding. We propose here a Matrix-Free (MF) implementation of Enright schemes that allows to obtain a method without the costs of forming, storing and factorizing the system matrix, which is much less computationally expensive than its matrix-explicit counterpart, and which performs competitively with other implicit schemes, such as the Modified Extended Backward Differentiation Formulae (MEBDF). The algorithm makes use of the preconditioned GMRES algorithm for solving the linear system of equations. The preconditioner is based on the ILU(0) factorization of an approximated but computationally cheaper form of the system matrix, and it has been reused for several time steps to improve the efficiency of the MF Newton-Krylov solver. We additionally employ a polynomial extrapolation technique to compute an accurate initial guess to the implicit nonlinear system. The stability properties of SD schemes have been analyzed by solving a linear model problem. For the analysis on the Navier-Stokes equations, two-dimensional inviscid and viscous test cases, both with a known analytical solution, are solved to assess the accuracy properties of the proposed time integration method for nonlinear autonomous and non-autonomous systems, respectively. The performance of the SD algorithm is compared with the ones obtained by using an MF-MEBDF solver, in order to evaluate its effectiveness, identifying its limitations and suggesting possible further improvements.
Computer-aided analysis of star shot films for high-accuracy radiation therapy treatment units
NASA Astrophysics Data System (ADS)
Depuydt, Tom; Penne, Rudi; Verellen, Dirk; Hrbacek, Jan; Lang, Stephanie; Leysen, Katrien; Vandevondel, Iwein; Poels, Kenneth; Reynders, Truus; Gevaert, Thierry; Duchateau, Michael; Tournel, Koen; Boussaer, Marlies; Cosentino, Dorian; Garibaldi, Cristina; Solberg, Timothy; De Ridder, Mark
2012-05-01
As mechanical stability of radiation therapy treatment devices has gone beyond sub-millimeter levels, there is a rising demand for simple yet highly accurate measurement techniques to support the routine quality control of these devices. A combination of using high-resolution radiosensitive film and computer-aided analysis could provide an answer. One generally known technique is the acquisition of star shot films to determine the mechanical stability of rotations of gantries and the therapeutic beam. With computer-aided analysis, mechanical performance can be quantified as a radiation isocenter radius size. In this work, computer-aided analysis of star shot film is further refined by applying an analytical solution for the smallest intersecting circle problem, in contrast to the gradient optimization approaches used until today. An algorithm is presented and subjected to a performance test using two different types of radiosensitive film, the Kodak EDR2 radiographic film and the ISP EBT2 radiochromic film. Artificial star shots with a priori known radiation isocenter size are used to determine the systematic errors introduced by the digitization of the film and the computer analysis. The estimated uncertainty on the isocenter size measurement with the presented technique was 0.04 mm (2σ) and 0.06 mm (2σ) for radiographic and radiochromic films, respectively. As an application of the technique, a study was conducted to compare the mechanical stability of O-ring gantry systems with C-arm-based gantries. In total ten systems of five different institutions were included in this study and star shots were acquired for gantry, collimator, ring, couch rotations and gantry wobble. It was not possible to draw general conclusions about differences in mechanical performance between O-ring and C-arm gantry systems, mainly due to differences in the beam-MLC alignment procedure accuracy. Nevertheless, the best performing O-ring system in this study, a BrainLab/MHI Vero system, and the best performing C-arm system, a Varian Truebeam system, showed comparable mechanical performance: gantry isocenter radius of 0.12 and 0.09 mm, respectively, ring/couch rotation of below 0.10 mm for both systems and a wobble of 0.06 and 0.18 mm, respectively. The methodology described in this work can be used to monitor mechanical performance constancy of high-accuracy treatment devices, with means available in a clinical radiation therapy environment.
Assessment of Slat Noise Predictions for 30P30N High-Lift Configuration From BANC-III Workshop
NASA Technical Reports Server (NTRS)
Choudhari, Meelan; Lockard, David P.
2015-01-01
This paper presents a summary of the computational predictions and measurement data contributed to Category 7 of the 3rd AIAA Workshop on Benchmark Problems for Airframe Noise Computations (BANC-III), which was held in Atlanta, GA, on June 14-15, 2014. Category 7 represents the first slat-noise configuration to be investigated under the BANC series of workshops, namely, the 30P30N two-dimensional high-lift model (with a slat contour that was slightly modified to enable unsteady pressure measurements) at an angle of attack that is relevant to approach conditions. Originally developed for a CFD challenge workshop to assess computational fluid dynamics techniques for steady high-lift predictions, the 30P30N configurations has provided a valuable opportunity for the airframe noise community to collectively assess and advance the computational and experimental techniques for slat noise. The contributed solutions are compared with each other as well as with the initial measurements that became available just prior to the BANC-III Workshop. Specific features of a number of computational solutions on the finer grids compare reasonably well with the initial measurements from FSU and JAXA facilities and/or with each other. However, no single solution (or a subset of solutions) could be identified as clearly superior to the remaining solutions. Grid sensitivity studies presented by multiple BANC-III participants demonstrated a relatively consistent trend of reduced surface pressure fluctuations, higher levels of turbulent kinetic energy in the flow, and lower levels of both narrow band peaks and the broadband component of unsteady pressure spectra in the nearfield and farfield. The lessons learned from the BANC-III contributions have been used to identify improvements to the problem statement for future Category-7 investigations.
Analysis of Algorithms: Coping with Hard Problems
ERIC Educational Resources Information Center
Kolata, Gina Bari
1974-01-01
Although today's computers can perform as many as one million operations per second, there are many problems that are still too large to be solved in a straightforward manner. Recent work indicates that many approximate solutions are useful and more efficient than exact solutions. (Author/RH)
Effect of virtual memory on efficient solution of two model problems
NASA Technical Reports Server (NTRS)
Lambiotte, J. J., Jr.
1977-01-01
Computers with virtual memory architecture allow programs to be written as if they were small enough to be contained in memory. Two types of problems are investigated to show that this luxury can lead to quite an inefficient performance if the programmer does not interact strongly with the characteristics of the operating system when developing the program. The two problems considered are the simultaneous solutions of a large linear system of equations by Gaussian elimination and a model three-dimensional finite-difference problem. The Control Data STAR-100 computer runs are made to demonstrate the inefficiencies of programming the problems in the manner one would naturally do if the problems were indeed, small enough to be contained in memory. Program redesigns are presented which achieve large improvements in performance through changes in the computational procedure and the data base arrangement.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1991-01-01
The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Computational Science News | Computational Science | NREL
-Cooled High-Performance Computing Technology at the ESIF February 28, 2018 NREL Launches New Website for High-Performance Computing System Users The National Renewable Energy Laboratory (NREL) Computational Science Center has launched a revamped website for users of the lab's high-performance computing (HPC
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Adaptiveness in monotone pseudo-Boolean optimization and stochastic neural computation.
Grossi, Giuliano
2009-08-01
Hopfield neural network (HNN) is a nonlinear computational model successfully applied in finding near-optimal solutions of several difficult combinatorial problems. In many cases, the network energy function is obtained through a learning procedure so that its minima are states falling into a proper subspace (feasible region) of the search space. However, because of the network nonlinearity, a number of undesirable local energy minima emerge from the learning procedure, significantly effecting the network performance. In the neural model analyzed here, we combine both a penalty and a stochastic process in order to enhance the performance of a binary HNN. The penalty strategy allows us to gradually lead the search towards states representing feasible solutions, so avoiding oscillatory behaviors or asymptotically instable convergence. Presence of stochastic dynamics potentially prevents the network to fall into shallow local minima of the energy function, i.e., quite far from global optimum. Hence, for a given fixed network topology, the desired final distribution on the states can be reached by carefully modulating such process. The model uses pseudo-Boolean functions both to express problem constraints and cost function; a combination of these two functions is then interpreted as energy of the neural network. A wide variety of NP-hard problems fall in the class of problems that can be solved by the model at hand, particularly those having a monotonic quadratic pseudo-Boolean function as constraint function. That is, functions easily derived by closed algebraic expressions representing the constraint structure and easy (polynomial time) to maximize. We show the asymptotic convergence properties of this model characterizing its state space distribution at thermal equilibrium in terms of Markov chain and give evidence of its ability to find high quality solutions on benchmarks and randomly generated instances of two specific problems taken from the computational graph theory.
NASA Technical Reports Server (NTRS)
Bratanow, T.; Aksu, H.; Spehert, T.
1975-01-01
A method based on the Navier-Stokes equations was developed for analyzing the unsteady incompressible viscous flow around oscillating airfoils at high Reynolds numbers. The Navier-Stokes equations have been integrated in their classical Helmholtz vorticity transport equation form, and the instantaneous velocity field at each time step was determined by the solution of Poisson's equation. A refined finite element was utilized to allow for a conformable solution of the stream function and its first space derivatives at the element interfaces. A corresponding set of accurate boundary conditions was applied; thus obtaining a rigorous solution for the velocity field. The details of the computational procedure and examples of computed results describing the unsteady flow characteristics around the airfoil are presented.
NASA Astrophysics Data System (ADS)
Xue, Bo; Mao, Bingjing; Chen, Xiaomei; Ni, Guoqiang
2010-11-01
This paper renders a configurable distributed high performance computing(HPC) framework for TDI-CCD imaging simulation. It uses strategy pattern to adapt multi-algorithms. Thus, this framework help to decrease the simulation time with low expense. Imaging simulation for TDI-CCD mounted on satellite contains four processes: 1) atmosphere leads degradation, 2) optical system leads degradation, 3) electronic system of TDI-CCD leads degradation and re-sampling process, 4) data integration. Process 1) to 3) utilize diversity data-intensity algorithms such as FFT, convolution and LaGrange Interpol etc., which requires powerful CPU. Even uses Intel Xeon X5550 processor, regular series process method takes more than 30 hours for a simulation whose result image size is 1500 * 1462. With literature study, there isn't any mature distributing HPC framework in this field. Here we developed a distribute computing framework for TDI-CCD imaging simulation, which is based on WCF[1], uses Client/Server (C/S) layer and invokes the free CPU resources in LAN. The server pushes the process 1) to 3) tasks to those free computing capacity. Ultimately we rendered the HPC in low cost. In the computing experiment with 4 symmetric nodes and 1 server , this framework reduced about 74% simulation time. Adding more asymmetric nodes to the computing network, the time decreased namely. In conclusion, this framework could provide unlimited computation capacity in condition that the network and task management server are affordable. And this is the brand new HPC solution for TDI-CCD imaging simulation and similar applications.
An Application of the Difference Potentials Method to Solving External Problems in CFD
NASA Technical Reports Server (NTRS)
Ryaben 'Kii, Victor S.; Tsynkov, Semyon V.
1997-01-01
Numerical solution of infinite-domain boundary-value problems requires some special techniques that would make the problem available for treatment on the computer. Indeed, the problem must be discretized in a way that the computer operates with only finite amount of information. Therefore, the original infinite-domain formulation must be altered and/or augmented so that on one hand the solution is not changed (or changed slightly) and on the other hand the finite discrete formulation becomes available. One widely used approach to constructing such discretizations consists of truncating the unbounded original domain and then setting the artificial boundary conditions (ABC's) at the newly formed external boundary. The role of the ABC's is to close the truncated problem and at the same time to ensure that the solution found inside the finite computational domain would be maximally close to (in the ideal case, exactly the same as) the corresponding fragment of the original infinite-domain solution. Let us emphasize that the proper treatment of artificial boundaries may have a profound impact on the overall quality and performance of numerical algorithms. The latter statement is corroborated by the numerous computational experiments and especially concerns the area of CFD, in which external problems present a wide class of practically important formulations. In this paper, we review some work that has been done over the recent years on constructing highly accurate nonlocal ABC's for calculation of compressible external flows. The approach is based on implementation of the generalized potentials and pseudodifferential boundary projection operators analogous to those proposed first by Calderon. The difference potentials method (DPM) by Ryaben'kii is used for the effective computation of the generalized potentials and projections. The resulting ABC's clearly outperform the existing methods from the standpoints of accuracy and robustness, in many cases noticeably speed up the multigrid convergence, and at the same time are quite comparable to other methods from the standpoints of geometric universality and simplicity of implementation.
Development of a realistic stress analysis for fatigue analysis of notched composite laminates
NASA Technical Reports Server (NTRS)
Humphreys, E. A.; Rosen, B. W.
1979-01-01
A finite element stress analysis which consists of a membrane and interlaminar shear spring analysis was developed. This approach was utilized in order to model physically realistic failure mechanisms while maintaining a high degree of computational economy. The accuracy of the stress analysis predictions is verified through comparisons with other solutions to the composite laminate edge effect problem. The stress analysis model was incorporated into an existing fatigue analysis methodology and the entire procedure computerized. A fatigue analysis is performed upon a square laminated composite plate with a circular central hole. A complete description and users guide for the computer code FLAC (Fatigue of Laminated Composites) is included as an appendix.
Ideal form of optical plasma lenses
NASA Astrophysics Data System (ADS)
Gordon, D. F.; Stamm, A. B.; Hafizi, B.; Johnson, L. A.; Kaganovich, D.; Hubbard, R. F.; Richardson, A. S.; Zhigunov, D.
2018-06-01
The canonical form of an optical plasma lens is a parabolic density channel. This form suffers from spherical aberrations, among others. Spherical aberration is partially corrected by adding a quartic term to the radial density profile. Ideal forms which lead to perfect focusing or imaging are obtained. The fields at the focus of a strong lens are computed with high accuracy and efficiency using a combination of eikonal and full Maxwell descriptions of the radiation propagation. The calculations are performed using a new computer propagation code, SeaRay, which is designed to transition between various solution methods as the beam propagates through different spatial regions. The calculations produce the full Maxwell vector fields in the focal region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garnier, Ch.; Mailhe, P.; Sontheimer, F.
2007-07-01
Fuel performance is a key factor for minimizing operating costs in nuclear plants. One of the important aspects of fuel performance is fuel rod design, based upon reliable tools able to verify the safety of current fuel solutions, prevent potential issues in new core managements and guide the invention of tomorrow's fuels. AREVA is developing its future global fuel rod code COPERNIC3, which is able to calculate the thermal-mechanical behavior of advanced fuel rods in nuclear plants. Some of the best practices to achieve this goal are described, by reviewing the three pillars of a fuel rod code: the database,more » the modelling and the computer and numerical aspects. At first, the COPERNIC3 database content is described, accompanied by the tools developed to effectively exploit the data. Then is given an overview of the main modelling aspects, by emphasizing the thermal, fission gas release and mechanical sub-models. In the last part, numerical solutions are detailed in order to increase the computational performance of the code, with a presentation of software configuration management solutions. (authors)« less
NASA Technical Reports Server (NTRS)
Thomas, P. D.
1979-01-01
The theoretical foundation and formulation of a numerical method for predicting the viscous flowfield in and about isolated three dimensional nozzles of geometrically complex configuration are presented. High Reynolds number turbulent flows are of primary interest for any combination of subsonic, transonic, and supersonic flow conditions inside or outside the nozzle. An alternating-direction implicit (ADI) numerical technique is employed to integrate the unsteady Navier-Stokes equations until an asymptotic steady-state solution is reached. Boundary conditions are computed with an implicit technique compatible with the ADI technique employed at interior points of the flow region. The equations are formulated and solved in a boundary-conforming curvilinear coordinate system. The curvilinear coordinate system and computational grid is generated numerically as the solution to an elliptic boundary value problem. A method is developed that automatically adjusts the elliptic system so that the interior grid spacing is controlled directly by the a priori selection of the grid spacing on the boundaries of the flow region.
The design and implementation of a parallel unstructured Euler solver using software primitives
NASA Technical Reports Server (NTRS)
Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.
1992-01-01
This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
NASA Astrophysics Data System (ADS)
Khe, A. K.; Cherevko, A. A.; Chupakhin, A. P.; Bobkova, M. S.; Krivoshapkin, A. L.; Orlov, K. Yu
2016-06-01
In this paper a computer simulation of a blood flow in cerebral vessels with a giant saccular aneurysm at the bifurcation of the basilar artery is performed. The modelling is based on patient-specific clinical data (both flow domain geometry and boundary conditions for the inlets and outlets). The hydrodynamic and mechanical parameters are calculated in the frameworks of three models: rigid-wall assumption, one-way FSI approach, and full (two-way) hydroelastic model. A comparison of the numerical solutions shows that mutual fluid- solid interaction can result in qualitative changes in the structure of the fluid flow. Other characteristics of the flow (pressure, stress, strain and displacement) qualitatively agree with each other in different approaches. However, the quantitative comparison shows that accounting for the flow-vessel interaction, in general, decreases the absolute values of these parameters. Solving of the hydroelasticity problem gives a more detailed solution at a cost of highly increased computational time.
Constantinescu, L; Kim, J; Chan, C; Feng, D
2007-01-01
The field of telemedicine is in need of generic solutions that harness the power of small, easily carried computing devices to increase efficiency and decrease the likelihood of medical errors. Our study resolved to build a framework to bridge the gap between handheld and desktop solutions by developing an automated network protocol that wirelessly propagates application data and images prepared by a powerful workstation to handheld clients for storage, display and collaborative manipulation. To this end, we present the Mobile Active Medical Protocol (MAMP), a framework capable of nigh-effortlessly linking medical workstation solutions to corresponding control interfaces on handheld devices for remote storage, control and display. The ease-of-use, encapsulation and applicability of this automated solution is designed to provide significant benefits to the rapid development of telemedical solutions. Our results demonstrate that the design of this system allows an acceptable data transfer rate, a usable framerate for diagnostic solutions and enough flexibility to enable its use in a wide variety of cases. To this end, we also present a large-scale multi-modality image viewer as an example application based on the MAMP.
Comparative performance evaluation of a new a-Si EPID that exceeds quad high-definition resolution.
McConnell, Kristen A; Alexandrian, Ara; Papanikolaou, Niko; Stathakis, Sotiri
2018-01-01
Electronic portal imaging devices (EPIDs) are an integral part of the radiation oncology workflow for treatment setup verification. Several commercial EPID implementations are currently available, each with varying capabilities. To standardize performance evaluation, Task Group Report 58 (TG-58) and TG-142 outline specific image quality metrics to be measured. A LinaTech Image Viewing System (IVS), with the highest commercially available pixel matrix (2688x2688 pixels), was independently evaluated and compared to an Elekta iViewGT (1024x1024 pixels) and a Varian aSi-1000 (1024x768 pixels) using a PTW EPID QC Phantom. The IVS, iViewGT, and aSi-1000 were each used to acquire 20 images of the PTW QC Phantom. The QC phantom was placed on the couch and aligned at isocenter. The images were exported and analyzed using the epidSoft image quality assurance (QA) software. The reported metrics were signal linearity, isotropy of signal linearity, signal-tonoise ratio (SNR), low contrast resolution, and high-contrast resolution. These values were compared between the three EPID solutions. Computed metrics demonstrated comparable results between the EPID solutions with the IVS outperforming the aSi-1000 and iViewGT in the low and high-contrast resolution analysis. The performance of three commercial EPID solutions have been quantified, evaluated, and compared using results from the PTW QC Phantom. The IVS outperformed the other panels in low and high-contrast resolution, but to fully realize the benefits of the IVS, the selection of the monitor on which to view the high-resolution images is important to prevent down sampling and visual of resolution.
A Validation Summary of the NCC Turbulent Reacting/non-reacting Spray Computations
NASA Technical Reports Server (NTRS)
Raju, M. S.; Liu, N.-S. (Technical Monitor)
2000-01-01
This pper provides a validation summary of the spray computations performed as a part of the NCC (National Combustion Code) development activity. NCC is being developed with the aim of advancing the current prediction tools used in the design of advanced technology combustors based on the multidimensional computational methods. The solution procedure combines the novelty of the application of the scalar Monte Carlo PDF (Probability Density Function) method to the modeling of turbulent spray flames with the ability to perform the computations on unstructured grids with parallel computing. The calculation procedure was applied to predict the flow properties of three different spray cases. One is a nonswirling unconfined reacting spray, the second is a nonswirling unconfined nonreacting spray, and the third is a confined swirl-stabilized spray flame. The comparisons involving both gas-phase and droplet velocities, droplet size distributions, and gas-phase temperatures show reasonable agreement with the available experimental data. The comparisons involve both the results obtained from the use of the Monte Carlo PDF method as well as those obtained from the conventional computational fluid dynamics (CFD) solution. Detailed comparisons in the case of a reacting nonswirling spray clearly highlight the importance of chemistry/turbulence interactions in the modeling of reacting sprays. The results from the PDF and non-PDF methods were found to be markedly different and the PDF solution is closer to the reported experimental data. The PDF computations predict that most of the combustion occurs in a predominantly diffusion-flame environment. However, the non-PDF solution predicts incorrectly that the combustion occurs in a predominantly vaporization-controlled regime. The Monte Carlo temperature distribution shows that the functional form of the PDF for the temperature fluctuations varies substantially from point to point. The results also bring to the fore some of the deficiencies associated with the use of assumed-shape PDF methods in spray computations.
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.
2014-12-01
OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
NASA Technical Reports Server (NTRS)
Thompson, D.; Mogili, P.; Chalasani, S.; Addy, H.; Choo, Y.
2004-01-01
Steady-state solutions of the Reynolds-averaged Navier-Stokes (RANS) equations were computed using the Colbalt flow solver for a constant-section, rectangular wing based on an extruded two-dimensional glaze ice shape. The one equation Spalart-Allmaras turbulence model was used. The results were compared with data obtained from a recent wind tunnel test. Computed results indicate that the steady RANS solutions do not accurately capture the recirculating region downstream of the ice accretion, even after a mesh refinement. The resulting predicted reattachment is farther downstream than indicated by the experimental data. Additionally, the solutions computed on a relatively coarse baseline mesh had detailed flow characteristics that were different from those computed on the refined mesh or the experimental data. Steady RANS solutions were also computed to investigate the effects of spanwise variation in the ice shape. The spanwise variation was obtained via a bleeding function that merged the ice shape with the clean wing using a sinusoidal spanwise variation. For these configurations, the results predicted for the extruded shape provided conservative estimates for the performance degradation of the wing. Additionally, the spanwise variation in the ice shape and the resulting differences in the flow fields did not significantly change the location of the primary reattachment.
NASA Astrophysics Data System (ADS)
Popov, Igor; Sukov, Sergey
2018-02-01
A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
The in-plant evaluation of a uranium NDA system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sprinkle, J.K. Jr.; Baxman, H.R.; Langner, D.G.
1979-12-31
The Los Alamos Scientific Laboratory has an unirradiated enriched uranium reprocessing facility. Various types of solutions are generated in this facility, including distillates and raffinates containing ppm of uranium and concentrated solutions with up to 400 grams U/t. In addition to uranyl nitrate and HNO{sub 3}, the solutions may also contain zirconium, niobium, fluoride, and small amounts of many metals. A uranium solution assay system (USAS) has been installed to allow accurate and more timely process control, accountability, and criticality data to be obtained. The USAS assays are made by a variety of techniques that depend upon state-of-the-art high-resolution Ge(Li)more » gamma-ray spectroscopy integrated with an interactive, user-oriented computer software package. Tight control of the system`s performance is maintained by constantly monitoring the USAS status. Daily measurement control sequences are required, and the user is forced by the software to perform these sequences. Routine assays require 400 or 1000 seconds for a precision of 0.5% over the concentration range of 5--400 g/t. A comparison of the USAS precision and accuracy with that obtained by traditional destructive analytical chemistry techniques (colorimetric and volumetric) is presented.« less
The in-plant evaluation of a uranium NDA system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sprinkle, J.K. Jr.; Baxman, H.R.; Langner, D.G.
1979-01-01
The Los Alamos Scientific Laboratory has an unirradiated enriched uranium reprocessing facility. Various types of solutions are generated in this facility, including distillates and raffinates containing ppm of uranium and concentrated solutions with up to 400 grams U/t. In addition to uranyl nitrate and HNO{sub 3}, the solutions may also contain zirconium, niobium, fluoride, and small amounts of many metals. A uranium solution assay system (USAS) has been installed to allow accurate and more timely process control, accountability, and criticality data to be obtained. The USAS assays are made by a variety of techniques that depend upon state-of-the-art high-resolution Ge(Li)more » gamma-ray spectroscopy integrated with an interactive, user-oriented computer software package. Tight control of the system's performance is maintained by constantly monitoring the USAS status. Daily measurement control sequences are required, and the user is forced by the software to perform these sequences. Routine assays require 400 or 1000 seconds for a precision of 0.5% over the concentration range of 5--400 g/t. A comparison of the USAS precision and accuracy with that obtained by traditional destructive analytical chemistry techniques (colorimetric and volumetric) is presented.« less
Enabling technologies for fiber optic sensing
NASA Astrophysics Data System (ADS)
Ibrahim, Selwan K.; Farnan, Martin; Karabacak, Devrez M.; Singer, Johannes M.
2016-04-01
In order for fiber optic sensors to compete with electrical sensors, several critical parameters need to be addressed such as performance, cost, size, reliability, etc. Relying on technologies developed in different industrial sectors helps to achieve this goal in a more efficient and cost effective way. FAZ Technology has developed a tunable laser based optical interrogator based on technologies developed in the telecommunication sector and optical transducer/sensors based on components sourced from the automotive market. Combining Fiber Bragg Grating (FBG) sensing technology with the above, high speed, high precision, reliable quasi distributed optical sensing systems for temperature, pressure, acoustics, acceleration, etc. has been developed. Careful design needs to be considered to filter out any sources of measurement drifts/errors due to different effects e.g. polarization and birefringence, coating imperfections, sensor packaging etc. Also to achieve high speed and high performance optical sensing systems, combining and synchronizing multiple optical interrogators similar to what has been used with computer/processors to deliver super computing power is an attractive solution. This path can be achieved by using photonic integrated circuit (PIC) technology which opens the doors to scaling up and delivering powerful optical sensing systems in an efficient and cost effective way.
Cellular automata-based modelling and simulation of biofilm structure on multi-core computers.
Skoneczny, Szymon
2015-01-01
The article presents a mathematical model of biofilm growth for aerobic biodegradation of a toxic carbonaceous substrate. Modelling of biofilm growth has fundamental significance in numerous processes of biotechnology and mathematical modelling of bioreactors. The process following double-substrate kinetics with substrate inhibition proceeding in a biofilm has not been modelled so far by means of cellular automata. Each process in the model proposed, i.e. diffusion of substrates, uptake of substrates, growth and decay of microorganisms and biofilm detachment, is simulated in a discrete manner. It was shown that for flat biofilm of constant thickness, the results of the presented model agree with those of a continuous model. The primary outcome of the study was to propose a mathematical model of biofilm growth; however a considerable amount of focus was also placed on the development of efficient algorithms for its solution. Two parallel algorithms were created, differing in the way computations are distributed. Computer programs were created using OpenMP Application Programming Interface for C++ programming language. Simulations of biofilm growth were performed on three high-performance computers. Speed-up coefficients of computer programs were compared. Both algorithms enabled a significant reduction of computation time. It is important, inter alia, in modelling and simulation of bioreactor dynamics.
Local pulmonary structure classification for computer-aided nodule detection
NASA Astrophysics Data System (ADS)
Bahlmann, Claus; Li, Xianlin; Okada, Kazunori
2006-03-01
We propose a new method of classifying the local structure types, such as nodules, vessels, and junctions, in thoracic CT scans. This classification is important in the context of computer aided detection (CAD) of lung nodules. The proposed method can be used as a post-process component of any lung CAD system. In such a scenario, the classification results provide an effective means of removing false positives caused by vessels and junctions thus improving overall performance. As main advantage, the proposed solution transforms the complex problem of classifying various 3D topological structures into much simpler 2D data clustering problem, to which more generic and flexible solutions are available in literature, and which is better suited for visualization. Given a nodule candidate, first, our solution robustly fits an anisotropic Gaussian to the data. The resulting Gaussian center and spread parameters are used to affine-normalize the data domain so as to warp the fitted anisotropic ellipsoid into a fixed-size isotropic sphere. We propose an automatic method to extract a 3D spherical manifold, containing the appropriate bounding surface of the target structure. Scale selection is performed by a data driven entropy minimization approach. The manifold is analyzed for high intensity clusters, corresponding to protruding structures. Techniques involve EMclustering with automatic mode number estimation, directional statistics, and hierarchical clustering with a modified Bhattacharyya distance. The estimated number of high intensity clusters explicitly determines the type of pulmonary structures: nodule (0), attached nodule (1), vessel (2), junction (>3). We show accurate classification results for selected examples in thoracic CT scans. This local procedure is more flexible and efficient than current state of the art and will help to improve the accuracy of general lung CAD systems.
Bergeest, Jan-Philip; Rohr, Karl
2012-10-01
In high-throughput applications, accurate and efficient segmentation of cells in fluorescence microscopy images is of central importance for the quantification of protein expression and the understanding of cell function. We propose an approach for segmenting cell nuclei which is based on active contours using level sets and convex energy functionals. Compared to previous work, our approach determines the global solution. Thus, the approach does not suffer from local minima and the segmentation result does not depend on the initialization. We consider three different well-known energy functionals for active contour-based segmentation and introduce convex formulations of these functionals. We also suggest a numeric approach for efficiently computing the solution. The performance of our approach has been evaluated using fluorescence microscopy images from different experiments comprising different cell types. We have also performed a quantitative comparison with previous segmentation approaches. Copyright © 2012 Elsevier B.V. All rights reserved.
Performance evaluation of the inverse dynamics method for optimal spacecraft reorientation
NASA Astrophysics Data System (ADS)
Ventura, Jacopo; Romano, Marcello; Walter, Ulrich
2015-05-01
This paper investigates the application of the inverse dynamics in the virtual domain method to Euler angles, quaternions, and modified Rodrigues parameters for rapid optimal attitude trajectory generation for spacecraft reorientation maneuvers. The impact of the virtual domain and attitude representation is numerically investigated for both minimum time and minimum energy problems. Owing to the nature of the inverse dynamics method, it yields sub-optimal solutions for minimum time problems. Furthermore, the virtual domain improves the optimality of the solution, but at the cost of more computational time. The attitude representation also affects solution quality and computational speed. For minimum energy problems, the optimal solution can be obtained without the virtual domain with any considered attitude representation.
Developments in REDES: The rocket engine design expert system
NASA Technical Reports Server (NTRS)
Davidian, Kenneth O.
1990-01-01
The Rocket Engine Design Expert System (REDES) is being developed at the NASA-Lewis to collect, automate, and perpetuate the existing expertise of performing a comprehensive rocket engine analysis and design. Currently, REDES uses the rigorous JANNAF methodology to analyze the performance of the thrust chamber and perform computational studies of liquid rocket engine problems. The following computer codes were included in REDES: a gas properties program named GASP, a nozzle design program named RAO, a regenerative cooling channel performance evaluation code named RTE, and the JANNAF standard liquid rocket engine performance prediction code TDK (including performance evaluation modules ODE, ODK, TDE, TDK, and BLM). Computational analyses are being conducted by REDES to provide solutions to liquid rocket engine thrust chamber problems. REDES is built in the Knowledge Engineering Environment (KEE) expert system shell and runs on a Sun 4/110 computer.
Developments in REDES: The Rocket Engine Design Expert System
NASA Technical Reports Server (NTRS)
Davidian, Kenneth O.
1990-01-01
The Rocket Engine Design Expert System (REDES) was developed at NASA-Lewis to collect, automate, and perpetuate the existing expertise of performing a comprehensive rocket engine analysis and design. Currently, REDES uses the rigorous JANNAF methodology to analyze the performance of the thrust chamber and perform computational studies of liquid rocket engine problems. The following computer codes were included in REDES: a gas properties program named GASP; a nozzle design program named RAO; a regenerative cooling channel performance evaluation code named RTE; and the JANNAF standard liquid rocket engine performance prediction code TDK (including performance evaluation modules ODE, ODK, TDE, TDK, and BLM). Computational analyses are being conducted by REDES to provide solutions to liquid rocket engine thrust chamber problems. REDES was built in the Knowledge Engineering Environment (KEE) expert system shell and runs on a Sun 4/110 computer.
NASA Astrophysics Data System (ADS)
Ahlstrand, Emma; Zukerman Schpector, Julio; Friedman, Ran
2017-11-01
When proteins are solvated in electrolyte solutions that contain alkali ions, the ions interact mostly with carboxylates on the protein surface. Correctly accounting for alkali-carboxylate interactions is thus important for realistic simulations of proteins. Acetates are the simplest carboxylates that are amphipathic, and experimental data for alkali acetate solutions are available and can be compared with observables obtained from simulations. We carried out molecular dynamics simulations of alkali acetate solutions using polarizable and non-polarizable forcefields and examined the ion-acetate interactions. In particular, activity coefficients and association constants were studied in a range of concentrations (0.03, 0.1, and 1M). In addition, quantum-mechanics (QM) based energy decomposition analysis was performed in order to estimate the contribution of polarization, electrostatics, dispersion, and QM (non-classical) effects on the cation-acetate and cation-water interactions. Simulations of Li-acetate solutions in general overestimated the binding of Li+ and acetates. In lower concentrations, the activity coefficients of alkali-acetate solutions were too high, which is suggested to be due to the simulation protocol and not the forcefields. Energy decomposition analysis suggested that improvement of the forcefield parameters to enable accurate simulations of Li-acetate solutions can be achieved but may require the use of a polarizable forcefield. Importantly, simulations with some ion parameters could not reproduce the correct ion-oxygen distances, which calls for caution in the choice of ion parameters when protein simulations are performed in electrolyte solutions.
OPTIMIZING THROUGH CO-EVOLUTIONARY AVALANCHES
DOE Office of Scientific and Technical Information (OSTI.GOV)
S. BOETTCHER; A. PERCUS
2000-08-01
We explore a new general-purpose heuristic for finding high-quality solutions to hard optimization problems. The method, called extremal optimization, is inspired by ''self-organized critically,'' a concept introduced to describe emergent complexity in many physical systems. In contrast to Genetic Algorithms which operate on an entire ''gene-pool'' of possible solutions, extremal optimization successively replaces extremely undesirable elements of a sub-optimal solution with new, random ones. Large fluctuations, called ''avalanches,'' ensue that efficiently explore many local optima. Drawing upon models used to simulate far-from-equilibrium dynamics, extremal optimization complements approximation methods inspired by equilibrium statistical physics, such as simulated annealing. With only onemore » adjustable parameter, its performance has proved competitive with more elaborate methods, especially near phase transitions. Those phase transitions are found in the parameter space of most optimization problems, and have recently been conjectured to be the origin of some of the hardest instances in computational complexity. We will demonstrate how extremal optimization can be implemented for a variety of combinatorial optimization problems. We believe that extremal optimization will be a useful tool in the investigation of phase transitions in combinatorial optimization problems, hence valuable in elucidating the origin of computational complexity.« less
The Propagation and Scattering of EM Waves in Electrically Large Ducts
NASA Astrophysics Data System (ADS)
Khan, Saeed Mahmood
The electromagnetic scattering from large arbitrarily shaped ducts with complex termination is studied here by a hybrid technique. The propagation of electromagnetic waves in the duct is analyzed in terms of an approximate modal solution. A finite difference technique is employed for computing the reflection characteristics of the complex terminations. Both solutions are combined using the unimoment method. The analysis here is carried out for monostatic RCS and considers only fields backscattered from inside the cavity. Rim-diffraction has been left out. The procedure offers such advantages as in that it is not necessary to find complicated Green's functions, which may not be readily available, when compared with the integral equation method. Hybridization performed by combining an approximate modal technique with a finite difference one makes the scheme numerically efficient. From a computational EM point of view, it brings together a whole spectrum of techniques associated with high frequency modal analysis, Fourier Methods, Radar Cross Section and Scattering, finite difference solution and the Unimoment Method. The practical application of this technique may range from the study of RCS scattered from jet inlets of radar evasive aircraft to submarine communication waveguides.
NASA Astrophysics Data System (ADS)
Babaveisi, Vahid; Paydar, Mohammad Mahdi; Safaei, Abdul Sattar
2018-07-01
This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of maximizing the profit, minimizing the total risk and shortages of products. Since three objective functions are considered, a multi-objective solution methodology can be advantageous. Therefore, several approaches have been studied and an NSGA-II algorithm is first utilized, and then the results are validated using an MOSA and MOPSO algorithms. Priority-based encoding, which is used in all the algorithms, is the core of the solution computations. To compare the performance of the meta-heuristics, random numerical instances are evaluated by four criteria involving mean ideal distance, spread of non-dominance solution, the number of Pareto solutions, and CPU time. In order to enhance the performance of the algorithms, Taguchi method is used for parameter tuning. Finally, sensitivity analyses are performed and the computational results are presented based on the sensitivity analyses in parameter tuning.
NASA Astrophysics Data System (ADS)
Babaveisi, Vahid; Paydar, Mohammad Mahdi; Safaei, Abdul Sattar
2017-07-01
This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of maximizing the profit, minimizing the total risk and shortages of products. Since three objective functions are considered, a multi-objective solution methodology can be advantageous. Therefore, several approaches have been studied and an NSGA-II algorithm is first utilized, and then the results are validated using an MOSA and MOPSO algorithms. Priority-based encoding, which is used in all the algorithms, is the core of the solution computations. To compare the performance of the meta-heuristics, random numerical instances are evaluated by four criteria involving mean ideal distance, spread of non-dominance solution, the number of Pareto solutions, and CPU time. In order to enhance the performance of the algorithms, Taguchi method is used for parameter tuning. Finally, sensitivity analyses are performed and the computational results are presented based on the sensitivity analyses in parameter tuning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Driver behavior profiling: An investigation with different smartphone sensors and machine learning
Ferreira, Jair; Carvalho, Eduardo; Ferreira, Bruno V.; de Souza, Cleidson; Suhara, Yoshihiko; Pentland, Alex
2017-01-01
Driver behavior impacts traffic safety, fuel/energy consumption and gas emissions. Driver behavior profiling tries to understand and positively impact driver behavior. Usually driver behavior profiling tasks involve automated collection of driving data and application of computer models to generate a classification that characterizes the driver aggressiveness profile. Different sensors and classification methods have been employed in this task, however, low-cost solutions and high performance are still research targets. This paper presents an investigation with different Android smartphone sensors, and classification algorithms in order to assess which sensor/method assembly enables classification with higher performance. The results show that specific combinations of sensors and intelligent methods allow classification performance improvement. PMID:28394925
Xu, Zhiliang; Chen, Xu-Yan; Liu, Yingjie
2014-01-01
We present a new formulation of the Runge-Kutta discontinuous Galerkin (RKDG) method [9, 8, 7, 6] for solving conservation Laws with increased CFL numbers. The new formulation requires the computed RKDG solution in a cell to satisfy additional conservation constraint in adjacent cells and does not increase the complexity or change the compactness of the RKDG method. Numerical computations for solving one-dimensional and two-dimensional scalar and systems of nonlinear hyperbolic conservation laws are performed with approximate solutions represented by piecewise quadratic and cubic polynomials, respectively. The hierarchical reconstruction [17, 33] is applied as a limiter to eliminate spurious oscillations in discontinuous solutions. From both numerical experiments and the analytic estimate of the CFL number of the newly formulated method, we find that: 1) this new formulation improves the CFL number over the original RKDG formulation by at least three times or more and thus reduces the overall computational cost; and 2) the new formulation essentially does not compromise the resolution of the numerical solutions of shock wave problems compared with ones computed by the RKDG method. PMID:25414520
NASA Astrophysics Data System (ADS)
Caillol, J. M.; Levesque, D.
1992-01-01
The reliability and the efficiency of a new method suitable for the simulations of dielectric fluids and ionic solutions is established by numerical computations. The efficiency depends on the use of a simulation cell which is the surface of a four-dimensional sphere. The reliability originates from a charge-charge potential solution of the Poisson equation in this confining volume. The computation time, for systems of a few hundred molecules, is reduced by a factor of 2 or 3 compared to this of a simulation performed in a cubic volume with periodic boundary conditions and the Ewald charge-charge potential.
A computational study of the topology of vortex breakdown
NASA Technical Reports Server (NTRS)
Spall, Robert E.; Gatski, Thomas B.
1991-01-01
A fully three-dimensional numerical simulation of vortex breakdown using the unsteady, incompressible Navier-Stokes equations has been performed. Solutions to four distinct types of breakdown are identified and compared with experimental results. The computed solutions include weak helical, double helix, spiral, and bubble-type breakdowns. The topological structure of the various breakdowns as well as their interrelationship are studied. The data reveal that the asymmetric modes of breakdown may be subject to additional breakdowns as the vortex core evolves in the streamwise direction. The solutions also show that the freestream axial velocity distribution has a significant effect on the position and type of vortex breakdown.
Satellite triangulation in Europe from WEST and ISAGEX data. [computer programs
NASA Technical Reports Server (NTRS)
Leick, A.; Arur, M.
1975-01-01
Observational data that was acquired during the West European Satellite Triangulation (WEST) program and the International Satellite Geodesy Experiment (ISAGEX) campaign was obtained for the purpose of performing a geometric solution to improve the present values of coordinates of the European stations in the OSU WN14 solutions, adding some new stations and assessing the quality of the WN14 solution with the help of the additional data available. The status of the data as received, the preprocessing required and the preliminary tests carried out for the initial screening of the data are described. The adjustment computations carried out and the results of the adjustments are discussed.
Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H
2014-05-28
Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
An Autonomous Underwater Recorder Based on a Single Board Computer.
Caldas-Morgan, Manuel; Alvarez-Rosario, Alexander; Rodrigues Padovese, Linilson
2015-01-01
As industrial activities continue to grow on the Brazilian coast, underwater sound measurements are becoming of great scientific importance as they are essential to evaluate the impact of these activities on local ecosystems. In this context, the use of commercial underwater recorders is not always the most feasible alternative, due to their high cost and lack of flexibility. Design and construction of more affordable alternatives from scratch can become complex because it requires profound knowledge in areas such as electronics and low-level programming. With the aim of providing a solution; a well succeeded model of a highly flexible, low-cost alternative to commercial recorders was built based on a Raspberry Pi single board computer. A properly working prototype was assembled and it demonstrated adequate performance levels in all tested situations. The prototype was equipped with a power management module which was thoroughly evaluated. It is estimated that it will allow for great battery savings on long-term scheduled recordings. The underwater recording device was successfully deployed at selected locations along the Brazilian coast, where it adequately recorded animal and manmade acoustic events, among others. Although power consumption may not be as efficient as that of commercial and/or micro-processed solutions, the advantage offered by the proposed device is its high customizability, lower development time and inherently, its cost.
Implicit preconditioned WENO scheme for steady viscous flow computation
NASA Astrophysics Data System (ADS)
Huang, Juan-Chen; Lin, Herng; Yang, Jaw-Yen
2009-02-01
A class of lower-upper symmetric Gauss-Seidel implicit weighted essentially nonoscillatory (WENO) schemes is developed for solving the preconditioned Navier-Stokes equations of primitive variables with Spalart-Allmaras one-equation turbulence model. The numerical flux of the present preconditioned WENO schemes consists of a first-order part and high-order part. For first-order part, we adopt the preconditioned Roe scheme and for the high-order part, we employ preconditioned WENO methods. For comparison purpose, a preconditioned TVD scheme is also given and tested. A time-derivative preconditioning algorithm is devised and a discriminant is devised for adjusting the preconditioning parameters at low Mach numbers and turning off the preconditioning at intermediate or high Mach numbers. The computations are performed for the two-dimensional lid driven cavity flow, low subsonic viscous flow over S809 airfoil, three-dimensional low speed viscous flow over 6:1 prolate spheroid, transonic flow over ONERA-M6 wing and hypersonic flow over HB-2 model. The solutions of the present algorithms are in good agreement with the experimental data. The application of the preconditioned WENO schemes to viscous flows at all speeds not only enhances the accuracy and robustness of resolving shock and discontinuities for supersonic flows, but also improves the accuracy of low Mach number flow with complicated smooth solution structures.
NASA Technical Reports Server (NTRS)
Manderscheid, J. M.; Kaufman, A.
1985-01-01
Turbine blades for reusable space propulsion systems are subject to severe thermomechanical loading cycles that result in large inelastic strains and very short lives. These components require the use of anisotropic high-temperature alloys to meet the safety and durability requirements of such systems. To assess the effects on blade life of material anisotropy, cyclic structural analyses are being performed for the first stage high-pressure fuel turbopump blade of the space shuttle main engine. The blade alloy is directionally solidified MAR-M 246 alloy. The analyses are based on a typical test stand engine cycle. Stress-strain histories at the airfoil critical location are computed using the MARC nonlinear finite-element computer code. The MARC solutions are compared to cyclic response predictions from a simplified structural analysis procedure developed at the NASA Lewis Research Center.
Tempest - Efficient Computation of Atmospheric Flows Using High-Order Local Discretization Methods
NASA Astrophysics Data System (ADS)
Ullrich, P. A.; Guerra, J. E.
2014-12-01
The Tempest Framework composes several compact numerical methods to easily facilitate intercomparison of atmospheric flow calculations on the sphere and in rectangular domains. This framework includes the implementations of Spectral Elements, Discontinuous Galerkin, Flux Reconstruction, and Hybrid Finite Element methods with the goal of achieving optimal accuracy in the solution of atmospheric problems. Several advantages of this approach are discussed such as: improved pressure gradient calculation, numerical stability by vertical/horizontal splitting, arbitrary order of accuracy, etc. The local numerical discretization allows for high performance parallel computation and efficient inclusion of parameterizations. These techniques are used in conjunction with a non-conformal, locally refined, cubed-sphere grid for global simulations and standard Cartesian grids for simulations at the mesoscale. A complete implementation of the methods described is demonstrated in a non-hydrostatic setting.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak
2001-01-01
The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions
2016-08-30
High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated
Xrootd in dCache - design and experiences
NASA Astrophysics Data System (ADS)
Behrmann, Gerd; Ozerov, Dmitry; Zangerl, Thomas
2011-12-01
dCache is a well established distributed storage solution used in both high energy physics computing and other disciplines. An overview of the implementation of the xrootd data access protocol within dCache is presented. The performance of various access mechanisms is studied and compared and it is concluded that our implementation is as perfomant as other protocols. This makes dCache a compelling alternative to the Scalla software suite implementation of xrootd, with added value from broad protocol support, including the IETF approved NFS 4.1 protocol.
NASA Astrophysics Data System (ADS)
Reis, C.; Clain, S.; Figueiredo, J.; Baptista, M. A.; Miranda, J. M. A.
2015-12-01
Numerical tools turn to be very important for scenario evaluations of hazardous phenomena such as tsunami. Nevertheless, the predictions highly depends on the numerical tool quality and the design of efficient numerical schemes still receives important attention to provide robust and accurate solutions. In this study we propose a comparative study between the efficiency of two volume finite numerical codes with second-order discretization implemented with different method to solve the non-conservative shallow water equations, the MUSCL (Monotonic Upstream-Centered Scheme for Conservation Laws) and the MOOD methods (Multi-dimensional Optimal Order Detection) which optimize the accuracy of the approximation in function of the solution local smoothness. The MUSCL is based on a priori criteria where the limiting procedure is performed before updated the solution to the next time-step leading to non-necessary accuracy reduction. On the contrary, the new MOOD technique uses a posteriori detectors to prevent the solution from oscillating in the vicinity of the discontinuities. Indeed, a candidate solution is computed and corrections are performed only for the cells where non-physical oscillations are detected. Using a simple one-dimensional analytical benchmark, 'Single wave on a sloping beach', we show that the classical 1D shallow-water system can be accurately solved with the finite volume method equipped with the MOOD technique and provide better approximation with sharper shock and less numerical diffusion. For the code validation, we also use the Tohoku-Oki 2011 tsunami and reproduce two DART records, demonstrating that the quality of the solution may deeply interfere with the scenario one can assess. This work is funded by the Portugal-France research agreement, through the research project GEONUM FCT-ANR/MAT-NAN/0122/2012.Numerical tools turn to be very important for scenario evaluations of hazardous phenomena such as tsunami. Nevertheless, the predictions highly depends on the numerical tool quality and the design of efficient numerical schemes still receives important attention to provide robust and accurate solutions. In this study we propose a comparative study between the efficiency of two volume finite numerical codes with second-order discretization implemented with different method to solve the non-conservative shallow water equations, the MUSCL (Monotonic Upstream-Centered Scheme for Conservation Laws) and the MOOD methods (Multi-dimensional Optimal Order Detection) which optimize the accuracy of the approximation in function of the solution local smoothness. The MUSCL is based on a priori criteria where the limiting procedure is performed before updated the solution to the next time-step leading to non-necessary accuracy reduction. On the contrary, the new MOOD technique uses a posteriori detectors to prevent the solution from oscillating in the vicinity of the discontinuities. Indeed, a candidate solution is computed and corrections are performed only for the cells where non-physical oscillations are detected. Using a simple one-dimensional analytical benchmark, 'Single wave on a sloping beach', we show that the classical 1D shallow-water system can be accurately solved with the finite volume method equipped with the MOOD technique and provide better approximation with sharper shock and less numerical diffusion. For the code validation, we also use the Tohoku-Oki 2011 tsunami and reproduce two DART records, demonstrating that the quality of the solution may deeply interfere with the scenario one can assess. This work is funded by the Portugal-France research agreement, through the research project GEONUM FCT-ANR/MAT-NAN/0122/2012.
Computational technique for stepwise quantitative assessment of equation correctness
NASA Astrophysics Data System (ADS)
Othman, Nuru'l Izzah; Bakar, Zainab Abu
2017-04-01
Many of the computer-aided mathematics assessment systems that are available today possess the capability to implement stepwise correctness checking of a working scheme for solving equations. The computational technique for assessing the correctness of each response in the scheme mainly involves checking the mathematical equivalence and providing qualitative feedback. This paper presents a technique, known as the Stepwise Correctness Checking and Scoring (SCCS) technique that checks the correctness of each equation in terms of structural equivalence and provides quantitative feedback. The technique, which is based on the Multiset framework, adapts certain techniques from textual information retrieval involving tokenization, document modelling and similarity evaluation. The performance of the SCCS technique was tested using worked solutions on solving linear algebraic equations in one variable. 350 working schemes comprising of 1385 responses were collected using a marking engine prototype, which has been developed based on the technique. The results show that both the automated analytical scores and the automated overall scores generated by the marking engine exhibit high percent agreement, high correlation and high degree of agreement with manual scores with small average absolute and mixed errors.
NASA Technical Reports Server (NTRS)
DeChant, Lawrence Justin
1998-01-01
In spite of rapid advances in both scalar and parallel computational tools, the large number of variables involved in both design and inverse problems make the use of sophisticated fluid flow models impractical, With this restriction, it is concluded that an important family of methods for mathematical/computational development are reduced or approximate fluid flow models. In this study a combined perturbation/numerical modeling methodology is developed which provides a rigorously derived family of solutions. The mathematical model is computationally more efficient than classical boundary layer but provides important two-dimensional information not available using quasi-1-d approaches. An additional strength of the current methodology is its ability to locally predict static pressure fields in a manner analogous to more sophisticated parabolized Navier Stokes (PNS) formulations. To resolve singular behavior, the model utilizes classical analytical solution techniques. Hence, analytical methods have been combined with efficient numerical methods to yield an efficient hybrid fluid flow model. In particular, the main objective of this research has been to develop a system of analytical and numerical ejector/mixer nozzle models, which require minimal empirical input. A computer code, DREA Differential Reduced Ejector/mixer Analysis has been developed with the ability to run sufficiently fast so that it may be used either as a subroutine or called by an design optimization routine. Models are of direct use to the High Speed Civil Transport Program (a joint government/industry project seeking to develop an economically.viable U.S. commercial supersonic transport vehicle) and are currently being adopted by both NASA and industry. Experimental validation of these models is provided by comparison to results obtained from open literature and Limited Exclusive Right Distribution (LERD) sources, as well as dedicated experiments performed at Texas A&M. These experiments have been performed using a hydraulic/gas flow analog. Results of comparisons of DREA computations with experimental data, which include entrainment, thrust, and local profile information, are overall good. Computational time studies indicate that DREA provides considerably more information at a lower computational cost than contemporary ejector nozzle design models. Finally. physical limitations of the method, deviations from experimental data, potential improvements and alternative formulations are described. This report represents closure to the NASA Graduate Researchers Program. Versions of the DREA code and a user's guide may be obtained from the NASA Lewis Research Center.
On the exact solutions of high order wave equations of KdV type (I)
NASA Astrophysics Data System (ADS)
Bulut, Hasan; Pandir, Yusuf; Baskonus, Haci Mehmet
2014-12-01
In this paper, by means of a proper transformation and symbolic computation, we study high order wave equations of KdV type (I). We obtained classification of exact solutions that contain soliton, rational, trigonometric and elliptic function solutions by using the extended trial equation method. As a result, the motivation of this paper is to utilize the extended trial equation method to explore new solutions of high order wave equation of KdV type (I). This method is confirmed by applying it to this kind of selected nonlinear equations.
Accelerating epistasis analysis in human genetics with consumer graphics hardware.
Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H
2009-07-24
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
Optimal cube-connected cube multiprocessors
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Wu, Jie
1993-01-01
Many CFD (computational fluid dynamics) and other scientific applications can be partitioned into subproblems. However, in general the partitioned subproblems are very large. They demand high performance computing power themselves, and the solutions of the subproblems have to be combined at each time step. The cube-connect cube (CCCube) architecture is studied. The CCCube architecture is an extended hypercube structure with each node represented as a cube. It requires fewer physical links between nodes than the hypercube, and provides the same communication support as the hypercube does on many applications. The reduced physical links can be used to enhance the bandwidth of the remaining links and, therefore, enhance the overall performance. The concept and the method to obtain optimal CCCubes, which are the CCCubes with a minimum number of links under a given total number of nodes, are proposed. The superiority of optimal CCCubes over standard hypercubes was also shown in terms of the link usage in the embedding of a binomial tree. A useful computation structure based on a semi-binomial tree for divide-and-conquer type of parallel algorithms was identified. It was shown that this structure can be implemented in optimal CCCubes without performance degradation compared with regular hypercubes. The result presented should provide a useful approach to design of scientific parallel computers.
MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frank Mueller
2009-02-05
MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less
Reduced-Order Models for the Aeroelastic Analysis of Ares Launch Vehicles
NASA Technical Reports Server (NTRS)
Silva, Walter A.; Vatsa, Veer N.; Biedron, Robert T.
2010-01-01
This document presents the development and application of unsteady aerodynamic, structural dynamic, and aeroelastic reduced-order models (ROMs) for the ascent aeroelastic analysis of the Ares I-X flight test and Ares I crew launch vehicles using the unstructured-grid, aeroelastic FUN3D computational fluid dynamics (CFD) code. The purpose of this work is to perform computationally-efficient aeroelastic response calculations that would be prohibitively expensive via computation of multiple full-order aeroelastic FUN3D solutions. These efficient aeroelastic ROM solutions provide valuable insight regarding the aeroelastic sensitivity of the vehicles to various parameters over a range of dynamic pressures.
Smart Collaborative Caching for Information-Centric IoT in Fog Computing.
Song, Fei; Ai, Zheng-Yang; Li, Jun-Jie; Pau, Giovanni; Collotta, Mario; You, Ilsun; Zhang, Hong-Ke
2017-11-01
The significant changes enabled by the fog computing had demonstrated that Internet of Things (IoT) urgently needs more evolutional reforms. Limited by the inflexible design philosophy; the traditional structure of a network is hard to meet the latest demands. However, Information-Centric Networking (ICN) is a promising option to bridge and cover these enormous gaps. In this paper, a Smart Collaborative Caching (SCC) scheme is established by leveraging high-level ICN principles for IoT within fog computing paradigm. The proposed solution is supposed to be utilized in resource pooling, content storing, node locating and other related situations. By investigating the available characteristics of ICN, some challenges of such combination are reviewed in depth. The details of building SCC, including basic model and advanced algorithms, are presented based on theoretical analysis and simplified examples. The validation focuses on two typical scenarios: simple status inquiry and complex content sharing. The number of clusters, packet loss probability and other parameters are also considered. The analytical results demonstrate that the performance of our scheme, regarding total packet number and average transmission latency, can outperform that of the original ones. We expect that the SCC will contribute an efficient solution to the related studies.
Smart Collaborative Caching for Information-Centric IoT in Fog Computing
Song, Fei; Ai, Zheng-Yang; Li, Jun-Jie; Zhang, Hong-Ke
2017-01-01
The significant changes enabled by the fog computing had demonstrated that Internet of Things (IoT) urgently needs more evolutional reforms. Limited by the inflexible design philosophy; the traditional structure of a network is hard to meet the latest demands. However, Information-Centric Networking (ICN) is a promising option to bridge and cover these enormous gaps. In this paper, a Smart Collaborative Caching (SCC) scheme is established by leveraging high-level ICN principles for IoT within fog computing paradigm. The proposed solution is supposed to be utilized in resource pooling, content storing, node locating and other related situations. By investigating the available characteristics of ICN, some challenges of such combination are reviewed in depth. The details of building SCC, including basic model and advanced algorithms, are presented based on theoretical analysis and simplified examples. The validation focuses on two typical scenarios: simple status inquiry and complex content sharing. The number of clusters, packet loss probability and other parameters are also considered. The analytical results demonstrate that the performance of our scheme, regarding total packet number and average transmission latency, can outperform that of the original ones. We expect that the SCC will contribute an efficient solution to the related studies. PMID:29104219
Monte Carlo track-structure calculations for aqueous solutions containing biomolecules
DOE Office of Scientific and Technical Information (OSTI.GOV)
Turner, J.E.; Hamm, R.N.; Ritchie, R.H.
1993-10-01
Detailed Monte Carlo calculations provide a powerful tool for understanding mechanisms of radiation damage to biological molecules irradiated in aqueous solution. This paper describes the computer codes, OREC and RADLYS, which have been developed for this purpose over a number of years. Some results are given for calculations of the irradiation of pure water. comparisons are presented between computations for liquid water and water vapor. Detailed calculations of the chemical yields of several products from X-irradiated, oxygen-free glycylglycine solutions have been performed as a function of solute concentration. Excellent agreement is obtained between calculated and measured yields. The Monte Carlomore » analysis provides a complete mechanistic picture of pathways to observed radiolytic products. This approach, successful with glycylglycine, will be extended to study the irradiation of oligonucleotides in aqueous solution.« less
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Data Movement Dominates: Advanced Memory Technology to Address the Real Exascale Power Problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergman, Keren
Energy is the fundamental barrier to Exascale supercomputing and is dominated by the cost of moving data from one point to another, not computation. Similarly, performance is dominated by data movement, not computation. The solution to this problem requires three critical technologies: 3D integration, optical chip-to-chip communication, and a new communication model. The central goal of the Sandia led "Data Movement Dominates" project aimed to develop memory systems and new architectures based on these technologies that have the potential to lower the cost of local memory accesses by orders of magnitude and provide substantially more bandwidth. Only through these transformationalmore » advances can future systems reach the goals of Exascale computing with a manageable power budgets. The Sandia led team included co-PIs from Columbia University, Lawrence Berkeley Lab, and the University of Maryland. The Columbia effort of Data Movement Dominates focused on developing a physically accurate simulation environment and experimental verification for optically-connected memory (OCM) systems that can enable continued performance scaling through high-bandwidth capacity, energy-efficient bit-rate transparency, and time-of-flight latency. With OCM, memory device parallelism and total capacity can scale to match future high-performance computing requirements without sacrificing data-movement efficiency. When we consider systems with integrated photonics, links to memory can be seamlessly integrated with the interconnection network-in a sense, memory becomes a primary aspect of the interconnection network. At the core of the Columbia effort, toward expanding our understanding of OCM enabled computing we have created an integrated modeling and simulation environment that uniquely integrates the physical behavior of the optical layer. The PhoenxSim suite of design and software tools developed under this effort has enabled the co-design of and performance evaluation photonics-enabled OCM architectures on Exascale computing systems.« less
Coupling MD Simulations and X-ray Absorption Spectroscopy to Study Ions in Solution
NASA Astrophysics Data System (ADS)
Marcos, E. Sánchez; Beret, E. C.; Martínez, J. M.; Pappalardo, R. R.; Ayala, R.; Muñoz-Páez, A.
2007-12-01
The structure of ionic solutions is a key-point in understanding physicochemical properties of electrolyte solutions. Among the reduced number of experimental techniques which can supply direct information on the ion environment, X-ray Absorption techniques (XAS) have gained importance during the last decades although they are not free of difficulties associated to the data analysis leading to provide reliable structures. Computer simulations of ions in solution is a theoretical alternative to provide information on the solvation structure. Thus, the use of computational chemistry can increase the understanding of these systems although an accurate description of ionic solvation phenomena represents nowadays a significant challenge to theoretical chemistry. We present: (a) the assignment of features in the XANES spectrum to well defined structural motif in the ion environment, (b) MD-based evaluation of EXAFS parameters used in the fitting procedure to make easier the structural resolution, and (c) the use of the agreement between experimental and simulated XANES spectra to help in the choice of a given intermolecular potential for Computer Simulations. Chemical problems examined are: (a) the identification of the second hydration shell in dilute aqueous solutions of highly-charged cations, such as Cr3+, Rh3+, Ir3+, (b) the invisibility by XAS of certain structures characterized by Computer Simulations but exhibiting high dynamical behavior and (c) the solvation of Br- in acetonitrile.
Coupling MD Simulations and X-ray Absorption Spectroscopy to Study Ions in Solution
NASA Astrophysics Data System (ADS)
Marcos, E. Sánchez; Beret, E. C.; Martínez, J. M.; Pappalardo, R. R.; Ayala, R.; Muñoz-Páez, A.
2007-11-01
The structure of ionic solutions is a key-point in understanding physicochemical properties of electrolyte solutions. Among the reduced number of experimental techniques which can supply direct information on the ion environment, X-ray Absorption techniques (XAS) have gained importance during the last decades although they are not free of difficulties associated to the data analysis leading to provide reliable structures. Computer simulations of ions in solution is a theoretical alternative to provide information on the solvation structure. Thus, the use of computational chemistry can increase the understanding of these systems although an accurate description of ionic solvation phenomena represents nowadays a significant challenge to theoretical chemistry. We present: (a) the assignment of features in the XANES spectrum to well defined structural motif in the ion environment, (b) MD-based evaluation of EXAFS parameters used in the fitting procedure to make easier the structural resolution, and (c) the use of the agreement between experimental and simulated XANES spectra to help in the choice of a given intermolecular potential for Computer Simulations. Chemical problems examined are: (a) the identification of the second hydration shell in dilute aqueous solutions of highly-charged cations, such as Cr3+, Rh3+, Ir3+, (b) the invisibility by XAS of certain structures characterized by Computer Simulations but exhibiting high dynamical behavior and (c) the solvation of Br- in acetonitrile.
POM.gpu-v1.0: a GPU-based Princeton Ocean Model
NASA Astrophysics Data System (ADS)
Xu, S.; Huang, X.; Oey, L.-Y.; Xu, F.; Fu, H.; Zhang, Y.; Yang, G.
2015-09-01
Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.
Youpi: YOUr processing PIpeline
NASA Astrophysics Data System (ADS)
Monnerville, Mathias; Sémah, Gregory
2012-03-01
Youpi is a portable, easy to use web application providing high level functionalities to perform data reduction on scientific FITS images. Built on top of various open source reduction tools released to the community by TERAPIX (http://terapix.iap.fr), Youpi can help organize data, manage processing jobs on a computer cluster in real time (using Condor) and facilitate teamwork by allowing fine-grain sharing of results and data. Youpi is modular and comes with plugins which perform, from within a browser, various processing tasks such as evaluating the quality of incoming images (using the QualityFITS software package), computing astrometric and photometric solutions (using SCAMP), resampling and co-adding FITS images (using SWarp) and extracting sources and building source catalogues from astronomical images (using SExtractor). Youpi is useful for small to medium-sized data reduction projects; it is free and is published under the GNU General Public License.
Validation of a wireless modular monitoring system for structures
NASA Astrophysics Data System (ADS)
Lynch, Jerome P.; Law, Kincho H.; Kiremidjian, Anne S.; Carryer, John E.; Kenny, Thomas W.; Partridge, Aaron; Sundararajan, Arvind
2002-06-01
A wireless sensing unit for use in a Wireless Modular Monitoring System (WiMMS) has been designed and constructed. Drawing upon advanced technological developments in the areas of wireless communications, low-power microprocessors and micro-electro mechanical system (MEMS) sensing transducers, the wireless sensing unit represents a high-performance yet low-cost solution to monitoring the short-term and long-term performance of structures. A sophisticated reduced instruction set computer (RISC) microcontroller is placed at the core of the unit to accommodate on-board computations, measurement filtering and data interrogation algorithms. The functionality of the wireless sensing unit is validated through various experiments involving multiple sensing transducers interfaced to the sensing unit. In particular, MEMS-based accelerometers are used as the primary sensing transducer in this study's validation experiments. A five degree of freedom scaled test structure mounted upon a shaking table is employed for system validation.
A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models
NASA Astrophysics Data System (ADS)
Li, Qia; Micchelli, Charles A.; Shen, Lixin; Xu, Yuesheng
2012-09-01
Our goal in this paper is to improve the computational performance of the proximity algorithms for the L1/TV denoising model. This leads us to a new characterization of all solutions to the L1/TV model via fixed-point equations expressed in terms of the proximity operators. Based upon this observation we develop an algorithm for solving the model and establish its convergence. Furthermore, we demonstrate that the proposed algorithm can be accelerated through the use of the componentwise Gauss-Seidel iteration so that the CPU time consumed is significantly reduced. Numerical experiments using the proposed algorithm for impulsive noise removal are included, with a comparison to three recently developed algorithms. The numerical results show that while the proposed algorithm enjoys a high quality of the restored images, as the other three known algorithms do, it performs significantly better in terms of computational efficiency measured in the CPU time consumed.
NASA Astrophysics Data System (ADS)
Painter, S.; Moulton, J. D.; Berndt, M.; Coon, E.; Garimella, R.; Lewis, K. C.; Manzini, G.; Mishra, P.; Travis, B. J.; Wilson, C. J.
2012-12-01
The frozen soils of the Arctic and subarctic regions contain vast amounts of stored organic carbon. This carbon is vulnerable to release to the atmosphere as temperatures warm and permafrost degrades. Understanding the response of the subsurface and surface hydrologic system to degrading permafrost is key to understanding the rate, timing, and chemical form of potential carbon releases to the atmosphere. Simulating the hydrologic system in degrading permafrost regions is challenging because of the potential for topographic evolution and associated drainage network reorganization as permafrost thaws and massive ground ice melts. The critical process models required for simulating hydrology include subsurface thermal hydrology of freezing/thawing soils, thermal processes within ice wedges, mechanical deformation processes, overland flow, and surface energy balances including snow dynamics. A new simulation tool, the Arctic Terrestrial Simulator (ATS), is being developed to simulate these coupled processes. The computational infrastructure must accommodate fully unstructured grids that track evolving topography, allow accurate solutions on distorted grids, provide robust and efficient solutions on highly parallel computer architectures, and enable flexibility in the strategies for coupling among the various processes. The ATS is based on Amanzi (Moulton et al. 2012), an object-oriented multi-process simulator written in C++ that provides much of the necessary computational infrastructure. Status and plans for the ATS including major hydrologic process models and validation strategies will be presented. Highly parallel simulations of overland flow using high-resolution digital elevation maps of polygonal patterned ground landscapes demonstrate the feasibility of the approach. Simulations coupling three-phase subsurface thermal hydrology with a simple thaw-induced subsidence model illustrate the strong feedbacks among the processes. D. Moulton, M. Berndt, M. Day, J. Meza, et al., High-Level Design of Amanzi, the Multi-Process High Performance Computing Simulator, Technical Report ASCEM-HPC-2011-03-1, DOE Environmental Management, 2012.
A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations
NASA Technical Reports Server (NTRS)
Dydson, Roger W.; Goodrich, John W.
2000-01-01
Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.
Using Java for distributed computing in the Gaia satellite data processing
NASA Astrophysics Data System (ADS)
O'Mullane, William; Luri, Xavier; Parsons, Paul; Lammers, Uwe; Hoar, John; Hernandez, Jose
2011-10-01
In recent years Java has matured to a stable easy-to-use language with the flexibility of an interpreter (for reflection etc.) but the performance and type checking of a compiled language. When we started using Java for astronomical applications around 1999 they were the first of their kind in astronomy. Now a great deal of astronomy software is written in Java as are many business applications. We discuss the current environment and trends concerning the language and present an actual example of scientific use of Java for high-performance distributed computing: ESA's mission Gaia. The Gaia scanning satellite will perform a galactic census of about 1,000 million objects in our galaxy. The Gaia community has chosen to write its processing software in Java. We explore the manifold reasons for choosing Java for this large science collaboration. Gaia processing is numerically complex but highly distributable, some parts being embarrassingly parallel. We describe the Gaia processing architecture and its realisation in Java. We delve into the astrometric solution which is the most advanced and most complex part of the processing. The Gaia simulator is also written in Java and is the most mature code in the system. This has been successfully running since about 2005 on the supercomputer "Marenostrum" in Barcelona. We relate experiences of using Java on a large shared machine. Finally we discuss Java, including some of its problems, for scientific computing.
ERIC Educational Resources Information Center
Federal Coordinating Council for Science, Engineering and Technology, Washington, DC.
This report presents a review of the High Performance Computing and Communications (HPCC) Program, which has as its goal the acceleration of the commercial availability and utilization of the next generation of high performance computers and networks in order to: (1) extend U.S. technological leadership in high performance computing and computer…
Equation solvers for distributed-memory computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
A large number of scientific and engineering problems require the rapid solution of large systems of simultaneous equations. The performance of parallel computers in this area now dwarfs traditional vector computers by nearly an order of magnitude. This talk describes the major issues involved in parallel equation solvers with particular emphasis on the Intel Paragon, IBM SP-1 and SP-2 processors.
Big Data Solution for CTBT Monitoring Using Global Cross Correlation
NASA Astrophysics Data System (ADS)
Gaillard, P.; Bobrov, D.; Dupont, A.; Grenouille, A.; Kitov, I. O.; Rozhkov, M.
2014-12-01
Due to the mismatch between data volume and the performance of the Information Technology infrastructure used in seismic data centers, it becomes more and more difficult to process all the data with traditional applications in a reasonable elapsed time. To fulfill their missions, the International Data Centre of the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO/IDC) and the Département Analyse Surveillance Environnement of Commissariat à l'Energie atomique et aux énergies alternatives (CEA/DASE) collect, process and produce complex data sets whose volume is growing exponentially. In the medium term, computer architectures, data management systems and application algorithms will require fundamental changes to meet the needs. This problem is well known and identified as a "Big Data" challenge. To tackle this major task, the CEA/DASE takes part during two years to the "DataScale" project. Started in September 2013, DataScale gathers a large set of partners (research laboratories, SMEs and big companies). The common objective is to design efficient solutions using the synergy between Big Data solutions and the High Performance Computing (HPC). The project will evaluate the relevance of these technological solutions by implementing a demonstrator for seismic event detections thanks to massive waveform correlations. The IDC has developed an expertise on such techniques leading to an algorithm called "Master Event" and provides a high-quality dataset for an extensive cross correlation study. The objective of the project is to enhance the Master Event algorithm and to reanalyze 10 years of waveform data from the International Monitoring System (IMS) network thanks to a dedicated HPC infrastructure operated by the "Centre de Calcul Recherche et Technologie" at the CEA of Bruyères-le-Châtel. The dataset used for the demonstrator includes more than 300,000 seismic events, tens of millions of raw detections and more than 30 terabytes of continuous seismic data from the primary IMS stations. In this talk, we will present the Master Event algorithm and the associated workflow, we will give an overview of the designed technical solutions (from the building blocks to the global infrastructure), and we will show the preliminary results at a regional scale.
Bio-inspired computational heuristics to study Lane-Emden systems arising in astrophysics model.
Ahmad, Iftikhar; Raja, Muhammad Asif Zahoor; Bilal, Muhammad; Ashraf, Farooq
2016-01-01
This study reports novel hybrid computational methods for the solutions of nonlinear singular Lane-Emden type differential equation arising in astrophysics models by exploiting the strength of unsupervised neural network models and stochastic optimization techniques. In the scheme the neural network, sub-part of large field called soft computing, is exploited for modelling of the equation in an unsupervised manner. The proposed approximated solutions of higher order ordinary differential equation are calculated with the weights of neural networks trained with genetic algorithm, and pattern search hybrid with sequential quadratic programming for rapid local convergence. The results of proposed solvers for solving the nonlinear singular systems are in good agreements with the standard solutions. Accuracy and convergence the design schemes are demonstrated by the results of statistical performance measures based on the sufficient large number of independent runs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
FINNEY, Charles E A; Edwards, Kevin Dean; Stoyanov, Miroslav K
2015-01-01
Combustion instabilities in dilute internal combustion engines are manifest in cyclic variability (CV) in engine performance measures such as integrated heat release or shaft work. Understanding the factors leading to CV is important in model-based control, especially with high dilution where experimental studies have demonstrated that deterministic effects can become more prominent. Observation of enough consecutive engine cycles for significant statistical analysis is standard in experimental studies but is largely wanting in numerical simulations because of the computational time required to compute hundreds or thousands of consecutive cycles. We have proposed and begun implementation of an alternative approach to allowmore » rapid simulation of long series of engine dynamics based on a low-dimensional mapping of ensembles of single-cycle simulations which map input parameters to output engine performance. This paper details the use Titan at the Oak Ridge Leadership Computing Facility to investigate CV in a gasoline direct-injected spark-ignited engine with a moderately high rate of dilution achieved through external exhaust gas recirculation. The CONVERGE CFD software was used to perform single-cycle simulations with imposed variations of operating parameters and boundary conditions selected according to a sparse grid sampling of the parameter space. Using an uncertainty quantification technique, the sampling scheme is chosen similar to a design of experiments grid but uses functions designed to minimize the number of samples required to achieve a desired degree of accuracy. The simulations map input parameters to output metrics of engine performance for a single cycle, and by mapping over a large parameter space, results can be interpolated from within that space. This interpolation scheme forms the basis for a low-dimensional metamodel which can be used to mimic the dynamical behavior of corresponding high-dimensional simulations. Simulations of high-EGR spark-ignition combustion cycles within a parametric sampling grid were performed and analyzed statistically, and sensitivities of the physical factors leading to high CV are presented. With these results, the prospect of producing low-dimensional metamodels to describe engine dynamics at any point in the parameter space will be discussed. Additionally, modifications to the methodology to account for nondeterministic effects in the numerical solution environment are proposed« less