high-performance computing techniques: Topics by Science.gov

Sample records for high-performance computing techniques

High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Smart Sampling and HPC-based Probabilistic Look-ahead Contingency Analysis Implementation and its Evaluation with Real-world Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Yousu; Etingov, Pavel V.; Ren, Huiying

This paper describes a probabilistic look-ahead contingency analysis application that incorporates smart sampling and high-performance computing (HPC) techniques. Smart sampling techniques are implemented to effectively represent the structure and statistical characteristics of uncertainty introduced by different sources in the power system. They can significantly reduce the data set size required for multiple look-ahead contingency analyses, and therefore reduce the time required to compute them. High-performance-computing (HPC) techniques are used to further reduce computational time. These two techniques enable a predictive capability that forecasts the impact of various uncertainties on potential transmission limit violations. The developed package has been tested withmore » real world data from the Bonneville Power Administration. Case study results are presented to demonstrate the performance of the applications developed.« less
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Petascale supercomputing to accelerate the design of high-temperature alloys

DOE PAGES

Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; ...

2017-10-25

Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shin, Dongwon; Lee, Sangkeun; Shyam, Amit

Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys

NASA Astrophysics Data System (ADS)

Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; Haynes, J. Allen

2017-12-01

Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ‧-Al2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviour of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. The approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.
Design and Implementation of High-Performance GIS Dynamic Objects Rendering Engine

NASA Astrophysics Data System (ADS)

Zhong, Y.; Wang, S.; Li, R.; Yun, W.; Song, G.

2017-12-01

Spatio-temporal dynamic visualization is more vivid than static visualization. It important to use dynamic visualization techniques to reveal the variation process and trend vividly and comprehensively for the geographical phenomenon. To deal with challenges caused by dynamic visualization of both 2D and 3D spatial dynamic targets, especially for different spatial data types require high-performance GIS dynamic objects rendering engine. The main approach for improving the rendering engine with vast dynamic targets relies on key technologies of high-performance GIS, including memory computing, parallel computing, GPU computing and high-performance algorisms. In this study, high-performance GIS dynamic objects rendering engine is designed and implemented for solving the problem based on hybrid accelerative techniques. The high-performance GIS rendering engine contains GPU computing, OpenGL technology, and high-performance algorism with the advantage of 64-bit memory computing. It processes 2D, 3D dynamic target data efficiently and runs smoothly with vast dynamic target data. The prototype system of high-performance GIS dynamic objects rendering engine is developed based SuperMap GIS iObjects. The experiments are designed for large-scale spatial data visualization, the results showed that the high-performance GIS dynamic objects rendering engine have the advantage of high performance. Rendering two-dimensional and three-dimensional dynamic objects achieve 20 times faster on GPU than on CPU.
Achieving High Performance with FPGA-Based Computing

PubMed Central

Herbordt, Martin C.; VanCourt, Tom; Gu, Yongfeng; Sukhwani, Bharat; Conti, Al; Model, Josh; DiSabello, Doug

2011-01-01

Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric. PMID:21603088
Tools for 3D scientific visualization in computational aerodynamics

NASA Technical Reports Server (NTRS)

Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val

1989-01-01

The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
DoD High Performance Computing Modernization Program Users Group Conference (HPCMP UGC 2011) Held in Portland, Oregon on June 20-23, 2011

DTIC Science & Technology

2011-06-01

4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
High-Performance Computing for the Electromagnetic Modeling and Simulation of Interconnects

NASA Technical Reports Server (NTRS)

Schutt-Aine, Jose E.

1996-01-01

The electromagnetic modeling of packages and interconnects plays a very important role in the design of high-speed digital circuits, and is most efficiently performed by using computer-aided design algorithms. In recent years, packaging has become a critical area in the design of high-speed communication systems and fast computers, and the importance of the software support for their development has increased accordingly. Throughout this project, our efforts have focused on the development of modeling and simulation techniques and algorithms that permit the fast computation of the electrical parameters of interconnects and the efficient simulation of their electrical performance.
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena

2010-09-30

Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; Russell, Samuel S.

2012-01-01

Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

NASA Technical Reports Server (NTRS)

Abdi, Frank

1996-01-01

A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
A survey of CPU-GPU heterogeneous computing techniques

DOE PAGES

Mittal, Sparsh; Vetter, Jeffrey S.

2015-07-04

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
A survey of CPU-GPU heterogeneous computing techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S.

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
Parallel computing in experimental mechanics and optical measurement: A review (II)

NASA Astrophysics Data System (ADS)

Wang, Tianyi; Kemao, Qian

2018-05-01

With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
Computer sciences

NASA Technical Reports Server (NTRS)

Smith, Paul H.

1988-01-01

The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.
High Performance Parallel Computational Nanotechnology

NASA Technical Reports Server (NTRS)

Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
Wilson Dslash Kernel From Lattice QCD Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joo, Balint; Smelyanskiy, Mikhail; Kalamkar, Dhiraj D.

2015-07-01

Lattice Quantum Chromodynamics (LQCD) is a numerical technique used for calculations in Theoretical Nuclear and High Energy Physics. LQCD is traditionally one of the first applications ported to many new high performance computing architectures and indeed LQCD practitioners have been known to design and build custom LQCD computers. Lattice QCD kernels are frequently used as benchmarks (e.g. 168.wupwise in the SPEC suite) and are generally well understood, and as such are ideal to illustrate several optimization techniques. In this chapter we will detail our work in optimizing the Wilson-Dslash kernels for Intel Xeon Phi, however, as we will show themore » technique gives excellent performance on regular Xeon Architecture as well.« less

Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Accelerating phylogenetics computing on the desktop: experiments with executing UPGMA in programmable logic.

PubMed

Davis, J P; Akella, S; Waddell, P H

2004-01-01

Having greater computational power on the desktop for processing taxa data sets has been a dream of biologists/statisticians involved in phylogenetics data analysis. Many existing algorithms have been highly optimized-one example being Felsenstein's PHYLIP code, written in C, for UPGMA and neighbor joining algorithms. However, the ability to process more than a few tens of taxa in a reasonable amount of time using conventional computers has not yielded a satisfactory speedup in data processing, making it difficult for phylogenetics practitioners to quickly explore data sets-such as might be done from a laptop computer. We discuss the application of custom computing techniques to phylogenetics. In particular, we apply this technology to speed up UPGMA algorithm execution by a factor of a hundred, against that of PHYLIP code running on the same PC. We report on these experiments and discuss how custom computing techniques can be used to not only accelerate phylogenetics algorithm performance on the desktop, but also on larger, high-performance computing engines, thus enabling the high-speed processing of data sets involving thousands of taxa.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

PubMed Central

Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

2014-01-01

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
SOI layout decomposition for double patterning lithography on high-performance computer platforms

NASA Astrophysics Data System (ADS)

Verstov, Vladimir; Zinchenko, Lyudmila; Makarchuk, Vladimir

2014-12-01

In the paper silicon on insulator layout decomposition algorithms for the double patterning lithography on high performance computing platforms are discussed. Our approach is based on the use of a contradiction graph and a modified concurrent breadth-first search algorithm. We evaluate our technique on 45 nm Nangate Open Cell Library including non-Manhattan geometry. Experimental results show that our soft computing algorithms decompose layout successfully and a minimal distance between polygons in layout is increased.
High performance computing and communications: Advancing the frontiers of information technology

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1997-12-31

This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental inmore » the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.« less
Computational technique for stepwise quantitative assessment of equation correctness

NASA Astrophysics Data System (ADS)

Othman, Nuru'l Izzah; Bakar, Zainab Abu

2017-04-01

Many of the computer-aided mathematics assessment systems that are available today possess the capability to implement stepwise correctness checking of a working scheme for solving equations. The computational technique for assessing the correctness of each response in the scheme mainly involves checking the mathematical equivalence and providing qualitative feedback. This paper presents a technique, known as the Stepwise Correctness Checking and Scoring (SCCS) technique that checks the correctness of each equation in terms of structural equivalence and provides quantitative feedback. The technique, which is based on the Multiset framework, adapts certain techniques from textual information retrieval involving tokenization, document modelling and similarity evaluation. The performance of the SCCS technique was tested using worked solutions on solving linear algebraic equations in one variable. 350 working schemes comprising of 1385 responses were collected using a marking engine prototype, which has been developed based on the technique. The results show that both the automated analytical scores and the automated overall scores generated by the marking engine exhibit high percent agreement, high correlation and high degree of agreement with manual scores with small average absolute and mixed errors.
Hypergraph-Based Combinatorial Optimization of Matrix-Vector Multiplication

ERIC Educational Resources Information Center

Wolf, Michael Maclean

2009-01-01

Combinatorial scientific computing plays an important enabling role in computational science, particularly in high performance scientific computing. In this thesis, we will describe our work on optimizing matrix-vector multiplication using combinatorial techniques. Our research has focused on two different problems in combinatorial scientific…
Preparing systems engineering and computing science students in disciplined methods, quantitative, and advanced statistical techniques to improve process performance

NASA Astrophysics Data System (ADS)

McCray, Wilmon Wil L., Jr.

The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.
RFA Guardian: Comprehensive Simulation of Radiofrequency Ablation Treatment of Liver Tumors.

PubMed

Voglreiter, Philip; Mariappan, Panchatcharam; Pollari, Mika; Flanagan, Ronan; Blanco Sequeiros, Roberto; Portugaller, Rupert Horst; Fütterer, Jurgen; Schmalstieg, Dieter; Kolesnik, Marina; Moche, Michael

2018-01-15

The RFA Guardian is a comprehensive application for high-performance patient-specific simulation of radiofrequency ablation of liver tumors. We address a wide range of usage scenarios. These include pre-interventional planning, sampling of the parameter space for uncertainty estimation, treatment evaluation and, in the worst case, failure analysis. The RFA Guardian is the first of its kind that exhibits sufficient performance for simulating treatment outcomes during the intervention. We achieve this by combining a large number of high-performance image processing, biomechanical simulation and visualization techniques into a generalized technical workflow. Further, we wrap the feature set into a single, integrated application, which exploits all available resources of standard consumer hardware, including massively parallel computing on graphics processing units. This allows us to predict or reproduce treatment outcomes on a single personal computer with high computational performance and high accuracy. The resulting low demand for infrastructure enables easy and cost-efficient integration into the clinical routine. We present a number of evaluation cases from the clinical practice where users performed the whole technical workflow from patient-specific modeling to final validation and highlight the opportunities arising from our fast, accurate prediction techniques.
Exploring Gigabyte Datasets in Real Time: Architectures, Interfaces and Time-Critical Design

NASA Technical Reports Server (NTRS)

Bryson, Steve; Gerald-Yamasaki, Michael (Technical Monitor)

1998-01-01

Architectures and Interfaces: The implications of real-time interaction on software architecture design: decoupling of interaction/graphics and computation into asynchronous processes. The performance requirements of graphics and computation for interaction. Time management in such an architecture. Examples of how visualization algorithms must be modified for high performance. Brief survey of interaction techniques and design, including direct manipulation and manipulation via widgets. talk discusses how human factors considerations drove the design and implementation of the virtual wind tunnel. Time-Critical Design: A survey of time-critical techniques for both computation and rendering. Emphasis on the assignment of a time budget to both the overall visualization environment and to each individual visualization technique in the environment. The estimation of the benefit and cost of an individual technique. Examples of the modification of visualization algorithms to allow time-critical control.
Random sampling technique for ultra-fast computations of molecular opacities for exoplanet atmospheres

NASA Astrophysics Data System (ADS)

Min, M.

2017-10-01

Context. Opacities of molecules in exoplanet atmospheres rely on increasingly detailed line-lists for these molecules. The line lists available today contain for many species up to several billions of lines. Computation of the spectral line profile created by pressure and temperature broadening, the Voigt profile, of all of these lines is becoming a computational challenge. Aims: We aim to create a method to compute the Voigt profile in a way that automatically focusses the computation time into the strongest lines, while still maintaining the continuum contribution of the high number of weaker lines. Methods: Here, we outline a statistical line sampling technique that samples the Voigt profile quickly and with high accuracy. The number of samples is adjusted to the strength of the line and the local spectral line density. This automatically provides high accuracy line shapes for strong lines or lines that are spectrally isolated. The line sampling technique automatically preserves the integrated line opacity for all lines, thereby also providing the continuum opacity created by the large number of weak lines at very low computational cost. Results: The line sampling technique is tested for accuracy when computing line spectra and correlated-k tables. Extremely fast computations ( 3.5 × 105 lines per second per core on a standard current day desktop computer) with high accuracy (≤1% almost everywhere) are obtained. A detailed recipe on how to perform the computations is given.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

NASA Astrophysics Data System (ADS)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.

2018-01-01

We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
A High Performance SOAP Engine for Grid Computing

NASA Astrophysics Data System (ADS)

Wang, Ning; Welzl, Michael; Zhang, Liang

Web Service technology still has many defects that make its usage for Grid computing problematic, most notably the low performance of the SOAP engine. In this paper, we develop a novel SOAP engine called SOAPExpress, which adopts two key techniques for improving processing performance: SCTP data transport and dynamic early binding based data mapping. Experimental results show a significant and consistent performance improvement of SOAPExpress over Apache Axis.
A high level language for a high performance computer

NASA Technical Reports Server (NTRS)

Perrott, R. H.

1978-01-01

The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.
Computer architecture evaluation for structural dynamics computations: Project summary

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1989-01-01

The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
Point and path performance of light aircraft: A review and analysis

NASA Technical Reports Server (NTRS)

Smetana, F. O.; Summey, D. C.; Johnson, W. D.

1973-01-01

The literature on methods for predicting the performance of light aircraft is reviewed. The methods discussed in the review extend from the classical instantaneous maximum or minimum technique to techniques for generating mathematically optimum flight paths. Classical point performance techniques are shown to be adequate in many cases but their accuracies are compromised by the need to use simple lift, drag, and thrust relations in order to get closed form solutions. Also the investigation of the effect of changes in weight, altitude, configuration, etc. involves many essentially repetitive calculations. Accordingly, computer programs are provided which can fit arbitrary drag polars and power curves with very high precision and which can then use the resulting fits to compute the performance under the assumption that the aircraft is not accelerating.
High-resolution subgrid models: background, grid generation, and implementation

NASA Astrophysics Data System (ADS)

Sehili, Aissa; Lang, Günther; Lippert, Christoph

2014-04-01

The basic idea of subgrid models is the use of available high-resolution bathymetric data at subgrid level in computations that are performed on relatively coarse grids allowing large time steps. For that purpose, an algorithm that correctly represents the precise mass balance in regions where wetting and drying occur was derived by Casulli (Int J Numer Method Fluids 60:391-408, 2009) and Casulli and Stelling (Int J Numer Method Fluids 67:441-449, 2010). Computational grid cells are permitted to be wet, partially wet, or dry, and no drying threshold is needed. Based on the subgrid technique, practical applications involving various scenarios were implemented including an operational forecast model for water level, salinity, and temperature of the Elbe Estuary in Germany. The grid generation procedure allows a detailed boundary fitting at subgrid level. The computational grid is made of flow-aligned quadrilaterals including few triangles where necessary. User-defined grid subdivision at subgrid level allows a correct representation of the volume up to measurement accuracy. Bottom friction requires a particular treatment. Based on the conveyance approach, an appropriate empirical correction was worked out. The aforementioned features make the subgrid technique very efficient, robust, and accurate. Comparison of predicted water levels with the comparatively highly resolved classical unstructured grid model shows very good agreement. The speedup in computational performance due to the use of the subgrid technique is about a factor of 20. A typical daily forecast can be carried out in less than 10 min on a standard PC-like hardware. The subgrid technique is therefore a promising framework to perform accurate temporal and spatial large-scale simulations of coastal and estuarine flow and transport processes at low computational cost.
Multidisciplinary design optimization using multiobjective formulation techniques

NASA Technical Reports Server (NTRS)

Chattopadhyay, Aditi; Pagaldipti, Narayanan S.

1995-01-01

This report addresses the development of a multidisciplinary optimization procedure using an efficient semi-analytical sensitivity analysis technique and multilevel decomposition for the design of aerospace vehicles. A semi-analytical sensitivity analysis procedure is developed for calculating computational grid sensitivities and aerodynamic design sensitivities. Accuracy and efficiency of the sensitivity analysis procedure is established through comparison of the results with those obtained using a finite difference technique. The developed sensitivity analysis technique are then used within a multidisciplinary optimization procedure for designing aerospace vehicles. The optimization problem, with the integration of aerodynamics and structures, is decomposed into two levels. Optimization is performed for improved aerodynamic performance at the first level and improved structural performance at the second level. Aerodynamic analysis is performed by solving the three-dimensional parabolized Navier Stokes equations. A nonlinear programming technique and an approximate analysis procedure are used for optimization. The proceduredeveloped is applied to design the wing of a high speed aircraft. Results obtained show significant improvements in the aircraft aerodynamic and structural performance when compared to a reference or baseline configuration. The use of the semi-analytical sensitivity technique provides significant computational savings.

Techniques for Enhancing Web-Based Education.

ERIC Educational Resources Information Center

Barbieri, Kathy; Mehringer, Susan

The Virtual Workshop is a World Wide Web-based set of modules on high performance computing developed at the Cornell Theory Center (CTC) (New York). This approach reaches a large audience, leverages staff effort, and poses challenges for developing interesting presentation techniques. This paper describes the following techniques with their…
An Analysis of Performance Enhancement Techniques for Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

2002-01-01

The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prowell, Stacy J; Symons, Christopher T

2015-01-01

Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.
Computer-aided analysis of star shot films for high-accuracy radiation therapy treatment units

NASA Astrophysics Data System (ADS)

Depuydt, Tom; Penne, Rudi; Verellen, Dirk; Hrbacek, Jan; Lang, Stephanie; Leysen, Katrien; Vandevondel, Iwein; Poels, Kenneth; Reynders, Truus; Gevaert, Thierry; Duchateau, Michael; Tournel, Koen; Boussaer, Marlies; Cosentino, Dorian; Garibaldi, Cristina; Solberg, Timothy; De Ridder, Mark

2012-05-01

As mechanical stability of radiation therapy treatment devices has gone beyond sub-millimeter levels, there is a rising demand for simple yet highly accurate measurement techniques to support the routine quality control of these devices. A combination of using high-resolution radiosensitive film and computer-aided analysis could provide an answer. One generally known technique is the acquisition of star shot films to determine the mechanical stability of rotations of gantries and the therapeutic beam. With computer-aided analysis, mechanical performance can be quantified as a radiation isocenter radius size. In this work, computer-aided analysis of star shot film is further refined by applying an analytical solution for the smallest intersecting circle problem, in contrast to the gradient optimization approaches used until today. An algorithm is presented and subjected to a performance test using two different types of radiosensitive film, the Kodak EDR2 radiographic film and the ISP EBT2 radiochromic film. Artificial star shots with a priori known radiation isocenter size are used to determine the systematic errors introduced by the digitization of the film and the computer analysis. The estimated uncertainty on the isocenter size measurement with the presented technique was 0.04 mm (2σ) and 0.06 mm (2σ) for radiographic and radiochromic films, respectively. As an application of the technique, a study was conducted to compare the mechanical stability of O-ring gantry systems with C-arm-based gantries. In total ten systems of five different institutions were included in this study and star shots were acquired for gantry, collimator, ring, couch rotations and gantry wobble. It was not possible to draw general conclusions about differences in mechanical performance between O-ring and C-arm gantry systems, mainly due to differences in the beam-MLC alignment procedure accuracy. Nevertheless, the best performing O-ring system in this study, a BrainLab/MHI Vero system, and the best performing C-arm system, a Varian Truebeam system, showed comparable mechanical performance: gantry isocenter radius of 0.12 and 0.09 mm, respectively, ring/couch rotation of below 0.10 mm for both systems and a wobble of 0.06 and 0.18 mm, respectively. The methodology described in this work can be used to monitor mechanical performance constancy of high-accuracy treatment devices, with means available in a clinical radiation therapy environment.
Tools for 3D scientific visualization in computational aerodynamics at NASA Ames Research Center

NASA Technical Reports Server (NTRS)

Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val

1989-01-01

Hardware, software, and techniques used by the Fluid Dynamics Division (NASA) for performing visualization of computational aerodynamics, which can be applied to the visualization of flow fields from computer simulations of fluid dynamics about the Space Shuttle, are discussed. Three visualization techniques applied, post-processing, tracking, and steering, are described, as well as the post-processing software packages used, PLOT3D, SURF (Surface Modeller), GAS (Graphical Animation System), and FAST (Flow Analysis software Toolkit). Using post-processing methods a flow simulation was executed on a supercomputer and, after the simulation was complete, the results were processed for viewing. It is shown that the high-resolution, high-performance three-dimensional workstation combined with specially developed display and animation software provides a good tool for analyzing flow field solutions obtained from supercomputers.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE PAGES

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...

2017-09-14

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Meiyue; Aktulga, H. Metin; Yang, Chao

In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Current state and future direction of computer systems at NASA Langley Research Center

NASA Technical Reports Server (NTRS)

Rogers, James L. (Editor); Tucker, Jerry H. (Editor)

1992-01-01

Computer systems have advanced at a rate unmatched by any other area of technology. As performance has dramatically increased there has been an equally dramatic reduction in cost. This constant cost performance improvement has precipitated the pervasiveness of computer systems into virtually all areas of technology. This improvement is due primarily to advances in microelectronics. Most people are now convinced that the new generation of supercomputers will be built using a large number (possibly thousands) of high performance microprocessors. Although the spectacular improvements in computer systems have come about because of these hardware advances, there has also been a steady improvement in software techniques. In an effort to understand how these hardware and software advances will effect research at NASA LaRC, the Computer Systems Technical Committee drafted this white paper to examine the current state and possible future directions of computer systems at the Center. This paper discusses selected important areas of computer systems including real-time systems, embedded systems, high performance computing, distributed computing networks, data acquisition systems, artificial intelligence, and visualization.
High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Time Tagging the Data

DTIC Science & Technology

2015-09-01

this report made use of posttest processing techniques to provide packet-level time tagging with an accuracy close to 3 µs relative to Coordinated...h set of test records. The process described herein made use of posttest processing techniques to provide packet-level time tagging with an accuracy
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.

PubMed

Cianfrocco, Michael A; Leschziner, Andres E

2015-05-08

The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

PubMed Central

Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan

2017-01-01

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

PubMed

Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

2017-08-04

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
A three-dimensional finite-element thermal/mechanical analytical technique for high-performance traveling wave tubes

NASA Technical Reports Server (NTRS)

Bartos, Karen F.; Fite, E. Brian; Shalkhauser, Kurt A.; Sharp, G. Richard

1991-01-01

Current research in high-efficiency, high-performance traveling wave tubes (TWT's) has led to the development of novel thermal/ mechanical computer models for use with helical slow-wave structures. A three-dimensional, finite element computer model and analytical technique used to study the structural integrity and thermal operation of a high-efficiency, diamond-rod, K-band TWT designed for use in advanced space communications systems. This analysis focused on the slow-wave circuit in the radiofrequency section of the TWT, where an inherent localized heating problem existed and where failures were observed during earlier cold compression, or 'coining' fabrication technique that shows great potential for future TWT development efforts. For this analysis, a three-dimensional, finite element model was used along with MARC, a commercially available finite element code, to simulate the fabrication of a diamond-rod TWT. This analysis was conducted by using component and material specifications consistent with actual TWT fabrication and was verified against empirical data. The analysis is nonlinear owing to material plasticity introduced by the forming process and also to geometric nonlinearities presented by the component assembly configuration. The computer model was developed by using the high efficiency, K-band TWT design but is general enough to permit similar analyses to be performed on a wide variety of TWT designs and styles. The results of the TWT operating condition and structural failure mode analysis, as well as a comparison of analytical results to test data are presented.
A three-dimensional finite-element thermal/mechanical analytical technique for high-performance traveling wave tubes

NASA Technical Reports Server (NTRS)

Shalkhauser, Kurt A.; Bartos, Karen F.; Fite, E. B.; Sharp, G. R.

1992-01-01

Current research in high-efficiency, high-performance traveling wave tubes (TWT's) has led to the development of novel thermal/mechanical computer models for use with helical slow-wave structures. A three-dimensional, finite element computer model and analytical technique used to study the structural integrity and thermal operation of a high-efficiency, diamond-rod, K-band TWT designed for use in advanced space communications systems. This analysis focused on the slow-wave circuit in the radiofrequency section of the TWT, where an inherent localized heating problem existed and where failures were observed during earlier cold compression, or 'coining' fabrication technique that shows great potential for future TWT development efforts. For this analysis, a three-dimensional, finite element model was used along with MARC, a commercially available finite element code, to simulate the fabrication of a diamond-rod TWT. This analysis was conducted by using component and material specifications consistent with actual TWT fabrication and was verified against empirical data. The analysis is nonlinear owing to material plasticity introduced by the forming process and also to geometric nonlinearities presented by the component assembly configuration. The computer model was developed by using the high efficiency, K-band TWT design but is general enough to permit similar analyses to be performed on a wide variety of TWT designs and styles. The results of the TWT operating condition and structural failure mode analysis, as well as a comparison of analytical results to test data are presented.
On finite element implementation and computational techniques for constitutive modeling of high temperature composites

NASA Technical Reports Server (NTRS)

Saleeb, A. F.; Chang, T. Y. P.; Wilt, T.; Iskovitz, I.

1989-01-01

The research work performed during the past year on finite element implementation and computational techniques pertaining to high temperature composites is outlined. In the present research, two main issues are addressed: efficient geometric modeling of composite structures and expedient numerical integration techniques dealing with constitutive rate equations. In the first issue, mixed finite elements for modeling laminated plates and shells were examined in terms of numerical accuracy, locking property and computational efficiency. Element applications include (currently available) linearly elastic analysis and future extension to material nonlinearity for damage predictions and large deformations. On the material level, various integration methods to integrate nonlinear constitutive rate equations for finite element implementation were studied. These include explicit, implicit and automatic subincrementing schemes. In all cases, examples are included to illustrate the numerical characteristics of various methods that were considered.
Acceleration of FDTD mode solver by high-performance computing techniques.

PubMed

Han, Lin; Xi, Yanping; Huang, Wei-Ping

2010-06-21

A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
Integrated computational study of ultra-high heat flux cooling using cryogenic micro-solid nitrogen spray

NASA Astrophysics Data System (ADS)

Ishimoto, Jun; Oh, U.; Tan, Daisuke

2012-10-01

A new type of ultra-high heat flux cooling system using the atomized spray of cryogenic micro-solid nitrogen (SN2) particles produced by a superadiabatic two-fluid nozzle was developed and numerically investigated for application to next generation super computer processor thermal management. The fundamental characteristics of heat transfer and cooling performance of micro-solid nitrogen particulate spray impinging on a heated substrate were numerically investigated and experimentally measured by a new type of integrated computational-experimental technique. The employed Computational Fluid Dynamics (CFD) analysis based on the Euler-Lagrange model is focused on the cryogenic spray behavior of atomized particulate micro-solid nitrogen and also on its ultra-high heat flux cooling characteristics. Based on the numerically predicted performance, a new type of cryogenic spray cooling technique for application to a ultra-high heat power density device was developed. In the present integrated computation, it is clarified that the cryogenic micro-solid spray cooling characteristics are affected by several factors of the heat transfer process of micro-solid spray which impinges on heated surface as well as by atomization behavior of micro-solid particles. When micro-SN2 spraying cooling was used, an ultra-high cooling heat flux level was achieved during operation, a better cooling performance than that with liquid nitrogen (LN2) spray cooling. As micro-SN2 cooling has the advantage of direct latent heat transport which avoids the film boiling state, the ultra-short time scale heat transfer in a thin boundary layer is more possible than in LN2 spray. The present numerical prediction of the micro-SN2 spray cooling heat flux profile can reasonably reproduce the measurement results of cooling wall heat flux profiles. The application of micro-solid spray as a refrigerant for next generation computer processors is anticipated, and its ultra-high heat flux technology is expected to result in an extensive improvement in the effective cooling performance of large scale supercomputer systems.
Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

DOE PAGES

Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

2015-04-08

The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
A maximum entropy reconstruction technique for tomographic particle image velocimetry

NASA Astrophysics Data System (ADS)

Bilsky, A. V.; Lozhkin, V. A.; Markovich, D. M.; Tokarev, M. P.

2013-04-01

This paper studies a novel approach for reducing tomographic PIV computational complexity. The proposed approach is an algebraic reconstruction technique, termed MENT (maximum entropy). This technique computes the three-dimensional light intensity distribution several times faster than SMART, using at least ten times less memory. Additionally, the reconstruction quality remains nearly the same as with SMART. This paper presents the theoretical computation performance comparison for MENT, SMART and MART, followed by validation using synthetic particle images. Both the theoretical assessment and validation of synthetic images demonstrate significant computational time reduction. The data processing accuracy of MENT was compared to that of SMART in a slot jet experiment. A comparison of the average velocity profiles shows a high level of agreement between the results obtained with MENT and those obtained with SMART.
Data flow modeling techniques

NASA Technical Reports Server (NTRS)

Kavi, K. M.

1984-01-01

There have been a number of simulation packages developed for the purpose of designing, testing and validating computer systems, digital systems and software systems. Complex analytical tools based on Markov and semi-Markov processes have been designed to estimate the reliability and performance of simulated systems. Petri nets have received wide acceptance for modeling complex and highly parallel computers. In this research data flow models for computer systems are investigated. Data flow models can be used to simulate both software and hardware in a uniform manner. Data flow simulation techniques provide the computer systems designer with a CAD environment which enables highly parallel complex systems to be defined, evaluated at all levels and finally implemented in either hardware or software. Inherent in data flow concept is the hierarchical handling of complex systems. In this paper we will describe how data flow can be used to model computer system.

Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Investigation to realize a computationally efficient implementation of the high-order instantaneous-moments-based fringe analysis method

NASA Astrophysics Data System (ADS)

Gorthi, Sai Siva; Rajshekhar, Gannavarpu; Rastogi, Pramod

2010-06-01

Recently, a high-order instantaneous moments (HIM)-operator-based method was proposed for accurate phase estimation in digital holographic interferometry. The method relies on piece-wise polynomial approximation of phase and subsequent evaluation of the polynomial coefficients from the HIM operator using single-tone frequency estimation. The work presents a comparative analysis of the performance of different single-tone frequency estimation techniques, like Fourier transform followed by optimization, estimation of signal parameters by rotational invariance technique (ESPRIT), multiple signal classification (MUSIC), and iterative frequency estimation by interpolation on Fourier coefficients (IFEIF) in HIM-operator-based methods for phase estimation. Simulation and experimental results demonstrate the potential of the IFEIF technique with respect to computational efficiency and estimation accuracy.
Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments

PubMed Central

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621
Ubiquitous green computing techniques for high demand applications in Smart environments.

PubMed

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.
Prediction of Scour below Flip Bucket using Soft Computing Techniques

NASA Astrophysics Data System (ADS)

Azamathulla, H. Md.; Ab Ghani, Aminuddin; Azazi Zakaria, Nor

2010-05-01

The accurate prediction of the depth of scour around hydraulic structure (trajectory spillways) has been based on the experimental studies and the equations developed are mainly empirical in nature. This paper evaluates the performance of the soft computing (intelligence) techiques, Adaptive Neuro-Fuzzy System (ANFIS) and Genetic expression Programming (GEP) approach, in prediction of scour below a flip bucket spillway. The results are very promising, which support the use of these intelligent techniques in prediction of highly non-linear scour parameters.
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud

PubMed Central

Cianfrocco, Michael A; Leschziner, Andres E

2015-01-01

The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available ‘off-the-shelf’ computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16–480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM. DOI: http://dx.doi.org/10.7554/eLife.06664.001 PMID:25955969
Challenges of Future High-End Computing

NASA Technical Reports Server (NTRS)

Bailey, David; Kutler, Paul (Technical Monitor)

1998-01-01

The next major milestone in high performance computing is a sustained rate of one Pflop/s (also written one petaflops, or 10(circumflex)15 floating-point operations per second). In addition to prodigiously high computational performance, such systems must of necessity feature very large main memories, as well as comparably high I/O bandwidth and huge mass storage facilities. The current consensus of scientists who have studied these issues is that "affordable" petaflops systems may be feasible by the year 2010, assuming that certain key technologies continue to progress at current rates. One important question is whether applications can be structured to perform efficiently on such systems, which are expected to incorporate many thousands of processors and deeply hierarchical memory systems. To answer these questions, advanced performance modeling techniques, including simulation of future architectures and applications, may be required. It may also be necessary to formulate "latency tolerant algorithms" and other completely new algorithmic approaches for certain applications. This talk will give an overview of these challenges.
Heterogeneous Distributed Computing for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy S.

1998-01-01

The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Future computing platforms for science in a power constrained era

DOE PAGES

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...

2015-12-23

Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less
Secure distributed genome analysis for GWAS and sequence comparison computation.

PubMed

Zhang, Yihua; Blanton, Marina; Almashaqbeh, Ghada

2015-01-01

The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice.
Secure distributed genome analysis for GWAS and sequence comparison computation

PubMed Central

2015-01-01

Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307
Modification of Hazen's equation in coarse grained soils by soft computing techniques

NASA Astrophysics Data System (ADS)

Kaynar, Oguz; Yilmaz, Isik; Marschalko, Marian; Bednarik, Martin; Fojtova, Lucie

2013-04-01

Hazen first proposed a Relationship between coefficient of permeability (k) and effective grain size (d10) was first proposed by Hazen, and it was then extended by some other researchers. However many attempts were done for estimation of k, correlation coefficients (R2) of the models were generally lower than ~0.80 and whole grain size distribution curves were not included in the assessments. Soft computing techniques such as; artificial neural networks, fuzzy inference systems, genetic algorithms, etc. and their hybrids are now being successfully used as an alternative tool. In this study, use of some soft computing techniques such as Artificial Neural Networks (ANNs) (MLP, RBF, etc.) and Adaptive Neuro-Fuzzy Inference System (ANFIS) for prediction of permeability of coarse grained soils was described, and Hazen's equation was then modificated. It was found that the soft computing models exhibited high performance in prediction of permeability coefficient. However four different kinds of ANN algorithms showed similar prediction performance, results of MLP was found to be relatively more accurate than RBF models. The most reliable prediction was obtained from ANFIS model.
Multicore Challenges and Benefits for High Performance Scientific Computing

DOE PAGES

Nielsen, Ida M. B.; Janssen, Curtis L.

2008-01-01

Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
Experiences with Probabilistic Analysis Applied to Controlled Systems

NASA Technical Reports Server (NTRS)

Kenny, Sean P.; Giesy, Daniel P.

2004-01-01

This paper presents a semi-analytic method for computing frequency dependent means, variances, and failure probabilities for arbitrarily large-order closed-loop dynamical systems possessing a single uncertain parameter or with multiple highly correlated uncertain parameters. The approach will be shown to not suffer from the same computational challenges associated with computing failure probabilities using conventional FORM/SORM techniques. The approach is demonstrated by computing the probabilistic frequency domain performance of an optimal feed-forward disturbance rejection scheme.
A service based adaptive U-learning system using UX.

PubMed

Jeong, Hwa-Young; Yi, Gangman

2014-01-01

In recent years, traditional development techniques for e-learning systems have been changing to become more convenient and efficient. One new technology in the development of application systems includes both cloud and ubiquitous computing. Cloud computing can support learning system processes by using services while ubiquitous computing can provide system operation and management via a high performance technical process and network. In the cloud computing environment, a learning service application can provide a business module or process to the user via the internet. This research focuses on providing the learning material and processes of courses by learning units using the services in a ubiquitous computing environment. And we also investigate functions that support users' tailored materials according to their learning style. That is, we analyzed the user's data and their characteristics in accordance with their user experience. We subsequently applied the learning process to fit on their learning performance and preferences. Finally, we demonstrate how the proposed system outperforms learning effects to learners better than existing techniques.
A Service Based Adaptive U-Learning System Using UX

PubMed Central

Jeong, Hwa-Young

2014-01-01

In recent years, traditional development techniques for e-learning systems have been changing to become more convenient and efficient. One new technology in the development of application systems includes both cloud and ubiquitous computing. Cloud computing can support learning system processes by using services while ubiquitous computing can provide system operation and management via a high performance technical process and network. In the cloud computing environment, a learning service application can provide a business module or process to the user via the internet. This research focuses on providing the learning material and processes of courses by learning units using the services in a ubiquitous computing environment. And we also investigate functions that support users' tailored materials according to their learning style. That is, we analyzed the user's data and their characteristics in accordance with their user experience. We subsequently applied the learning process to fit on their learning performance and preferences. Finally, we demonstrate how the proposed system outperforms learning effects to learners better than existing techniques. PMID:25147832
Hybrid parallel computing architecture for multiview phase shifting

NASA Astrophysics Data System (ADS)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Soft bilateral filtering volumetric shadows using cube shadow maps

PubMed Central

Ali, Hatam H.; Sunar, Mohd Shahrizal; Kolivand, Hoshang

2017-01-01

Volumetric shadows often increase the realism of rendered scenes in computer graphics. Typical volumetric shadows techniques do not provide a smooth transition effect in real-time with conservation on crispness of boundaries. This research presents a new technique for generating high quality volumetric shadows by sampling and interpolation. Contrary to conventional ray marching method, which requires extensive time, this proposed technique adopts downsampling in calculating ray marching. Furthermore, light scattering is computed in High Dynamic Range buffer to generate tone mapping. The bilateral interpolation is used along a view rays to smooth transition of volumetric shadows with respect to preserving-edges. In addition, this technique applied a cube shadow map to create multiple shadows. The contribution of this technique isreducing the number of sample points in evaluating light scattering and then introducing bilateral interpolation to improve volumetric shadows. This contribution is done by removing the inherent deficiencies significantly in shadow maps. This technique allows obtaining soft marvelous volumetric shadows, having a good performance and high quality, which show its potential for interactive applications. PMID:28632740
Computer systems performance measurement techniques.

DOT National Transportation Integrated Search

1971-06-01

Computer system performance measurement techniques, tools, and approaches are presented as a foundation for future recommendations regarding the instrumentation of the ARTS ATC data processing subsystem for purposes of measurement and evaluation.
Parallel computing in genomic research: advances and applications

PubMed Central

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801

Parallel computing in genomic research: advances and applications.

PubMed

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
High pressure water jet cutting and stripping

NASA Technical Reports Server (NTRS)

Hoppe, David T.; Babai, Majid K.

1991-01-01

High pressure water cutting techniques have a wide range of applications to the American space effort. Hydroblasting techniques are commonly used during the refurbishment of the reusable solid rocket motors. The process can be controlled to strip a thermal protective ablator without incurring any damage to the painted surface underneath by using a variation of possible parameters. Hydroblasting is a technique which is easily automated. Automation removes personnel from the hostile environment of the high pressure water. Computer controlled robots can perform the same task in a fraction of the time that would be required by manual operation.
Three-dimensional near-field MIMO array imaging using range migration techniques.

PubMed

Zhuge, Xiaodong; Yarovoy, Alexander G

2012-06-01

This paper presents a 3-D near-field imaging algorithm that is formulated for 2-D wideband multiple-input-multiple-output (MIMO) imaging array topology. The proposed MIMO range migration technique performs the image reconstruction procedure in the frequency-wavenumber domain. The algorithm is able to completely compensate the curvature of the wavefront in the near-field through a specifically defined interpolation process and provides extremely high computational efficiency by the application of the fast Fourier transform. The implementation aspects of the algorithm and the sampling criteria of a MIMO aperture are discussed. The image reconstruction performance and computational efficiency of the algorithm are demonstrated both with numerical simulations and measurements using 2-D MIMO arrays. Real-time 3-D near-field imaging can be achieved with a real-aperture array by applying the proposed MIMO range migration techniques.
Approximate Bayesian computation for spatial SEIR(S) epidemic models.

PubMed

Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A

2018-02-01

Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bayer Digester Optimization Studies using Computer Techniques

NASA Astrophysics Data System (ADS)

Kotte, Jan J.; Schleider, Victor H.

Theoretically required heat transfer performance by the multistaged flash heat reclaim system of a high pressure Bayer digester unit is determined for various conditions of discharge temperature, excess flash vapor and indirect steam addition. Solution of simultaneous heat balances around the digester vessels and the heat reclaim system yields the magnitude of available heat for representation of each case on a temperature-enthalpy diagram, where graphical fit of the number of flash stages fixes the heater requirements. Both the heat balances and the trial-and-error graphical solution are adapted to solution by digital computer techniques.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

NASA Technical Reports Server (NTRS)

Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

2002-01-01

Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
High speed civil transport: Sonic boom softening and aerodynamic optimization

NASA Technical Reports Server (NTRS)

Cheung, Samson

1994-01-01

An improvement in sonic boom extrapolation techniques has been the desire of aerospace designers for years. This is because the linear acoustic theory developed in the 60's is incapable of predicting the nonlinear phenomenon of shock wave propagation. On the other hand, CFD techniques are too computationally expensive to employ on sonic boom problems. Therefore, this research focused on the development of a fast and accurate sonic boom extrapolation method that solves the Euler equations for axisymmetric flow. This new technique has brought the sonic boom extrapolation techniques up to the standards of the 90's. Parallel computing is a fast growing subject in the field of computer science because of its promising speed. A new optimizer (IIOWA) for the parallel computing environment has been developed and tested for aerodynamic drag minimization. This is a promising method for CFD optimization making use of the computational resources of workstations, which unlike supercomputers can spend most of their time idle. Finally, the OAW concept is attractive because of its overall theoretical performance. In order to fully understand the concept, a wind-tunnel model was built and is currently being tested at NASA Ames Research Center. The CFD calculations performed under this cooperative agreement helped to identify the problem of the flow separation, and also aided the design by optimizing the wing deflection for roll trim.
Enhanced Multiobjective Optimization Technique for Comprehensive Aerospace Design. Part A

NASA Technical Reports Server (NTRS)

Chattopadhyay, Aditi; Rajadas, John N.

1997-01-01

A multidisciplinary design optimization procedure which couples formal multiobjectives based techniques and complex analysis procedures (such as computational fluid dynamics (CFD) codes) developed. The procedure has been demonstrated on a specific high speed flow application involving aerodynamics and acoustics (sonic boom minimization). In order to account for multiple design objectives arising from complex performance requirements, multiobjective formulation techniques are used to formulate the optimization problem. Techniques to enhance the existing Kreisselmeier-Steinhauser (K-S) function multiobjective formulation approach have been developed. The K-S function procedure used in the proposed work transforms a constrained multiple objective functions problem into an unconstrained problem which then is solved using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Weight factors are introduced during the transformation process to each objective function. This enhanced procedure will provide the designer the capability to emphasize specific design objectives during the optimization process. The demonstration of the procedure utilizes a computational Fluid dynamics (CFD) code which solves the three-dimensional parabolized Navier-Stokes (PNS) equations for the flow field along with an appropriate sonic boom evaluation procedure thus introducing both aerodynamic performance as well as sonic boom as the design objectives to be optimized simultaneously. Sensitivity analysis is performed using a discrete differentiation approach. An approximation technique has been used within the optimizer to improve the overall computational efficiency of the procedure in order to make it suitable for design applications in an industrial setting.
Using a cloud to replenish parched groundwater modeling efforts.

PubMed

Hunt, Randall J; Luchette, Joseph; Schreuder, Willem A; Rumbaugh, James O; Doherty, John; Tonkin, Matthew J; Rumbaugh, Douglas B

2010-01-01

Groundwater models can be improved by introduction of additional parameter flexibility and simultaneous use of soft-knowledge. However, these sophisticated approaches have high computational requirements. Cloud computing provides unprecedented access to computing power via the Internet to facilitate the use of these techniques. A modeler can create, launch, and terminate "virtual" computers as needed, paying by the hour, and save machine images for future use. Such cost-effective and flexible computing power empowers groundwater modelers to routinely perform model calibration and uncertainty analysis in ways not previously possible.
Using a cloud to replenish parched groundwater modeling efforts

USGS Publications Warehouse

Hunt, Randall J.; Luchette, Joseph; Schreuder, Willem A.; Rumbaugh, James O.; Doherty, John; Tonkin, Matthew J.; Rumbaugh, Douglas B.

2010-01-01

Groundwater models can be improved by introduction of additional parameter flexibility and simultaneous use of soft-knowledge. However, these sophisticated approaches have high computational requirements. Cloud computing provides unprecedented access to computing power via the Internet to facilitate the use of these techniques. A modeler can create, launch, and terminate “virtual” computers as needed, paying by the hour, and save machine images for future use. Such cost-effective and flexible computing power empowers groundwater modelers to routinely perform model calibration and uncertainty analysis in ways not previously possible.
Evaluating Application Resilience with XRay

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Sui; Bronevetsky, Greg; Li, Bin

2015-05-07

The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resiilent to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithmspecific error detection andmore » tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults and (ii) the appropriate resilience technique is applied to each code region. This paper presents XRay, a tool to view the application vulnerability to soft errors, and illustrates how XRay can be used in the context of a representative application. In addition to providing actionable insights into application behavior XRay automatically selects the number of fault injection experiments required to provide an informative view of application behavior, ensuring that the information is statistically well-grounded without performing unnecessary experiments.« less
Hypergraph Based Feature Selection Technique for Medical Diagnosis.

PubMed

Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar

2016-11-01

The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Applications of CFD and visualization techniques

NASA Technical Reports Server (NTRS)

Saunders, James H.; Brown, Susan T.; Crisafulli, Jeffrey J.; Southern, Leslie A.

1992-01-01

In this paper, three applications are presented to illustrate current techniques for flow calculation and visualization. The first two applications use a commercial computational fluid dynamics (CFD) code, FLUENT, performed on a Cray Y-MP. The results are animated with the aid of data visualization software, apE. The third application simulates a particulate deposition pattern using techniques inspired by developments in nonlinear dynamical systems. These computations were performed on personal computers.
Clinical Consultation Systems: Designing for the Physician as Computer User

PubMed Central

Shortliffe, Edward H.

1981-01-01

The barriers to successful implementation of consultation systems for physicians have been frequently discussed. Our research has been directed at the development of computer techniques that will heighten the acceptance of high performance decision making tools. We discuss two current projects at Stanford Medical School that address both practical and theoretical issues in the design and construction of clinically useful systems.
Recent achievements in real-time computational seismology in Taiwan

NASA Astrophysics Data System (ADS)

Lee, S.; Liang, W.; Huang, B.

2012-12-01

Real-time computational seismology is currently possible to be achieved which needs highly connection between seismic database and high performance computing. We have developed a real-time moment tensor monitoring system (RMT) by using continuous BATS records and moment tensor inversion (CMT) technique. The real-time online earthquake simulation service is also ready to open for researchers and public earthquake science education (ROS). Combine RMT with ROS, the earthquake report based on computational seismology can provide within 5 minutes after an earthquake occurred (RMT obtains point source information < 120 sec; ROS completes a 3D simulation < 3 minutes). All of these computational results are posted on the internet in real-time now. For more information, welcome to visit real-time computational seismology earthquake report webpage (RCS).
Comparison of Image Processing Techniques for Nonviable Tissue Quantification in Late Gadolinium Enhancement Cardiac Magnetic Resonance Images.

PubMed

Carminati, M Chiara; Boniotti, Cinzia; Fusini, Laura; Andreini, Daniele; Pontone, Gianluca; Pepi, Mauro; Caiani, Enrico G

2016-05-01

The aim of this study was to compare the performance of quantitative methods, either semiautomated or automated, for left ventricular (LV) nonviable tissue analysis from cardiac magnetic resonance late gadolinium enhancement (CMR-LGE) images. The investigated segmentation techniques were: (i) n-standard deviations thresholding; (ii) full width at half maximum thresholding; (iii) Gaussian mixture model classification; and (iv) fuzzy c-means clustering. These algorithms were applied either in each short axis slice (single-slice approach) or globally considering the entire short-axis stack covering the LV (global approach). CMR-LGE images from 20 patients with ischemic cardiomyopathy were retrospectively selected, and results from each technique were assessed against manual tracing. All methods provided comparable performance in terms of accuracy in scar detection, computation of local transmurality, and high correlation in scar mass compared with the manual technique. In general, no significant difference between single-slice and global approach was noted. The reproducibility of manual and investigated techniques was confirmed in all cases with slightly lower results for the nSD approach. Automated techniques resulted in accurate and reproducible evaluation of LV scars from CMR-LGE in ischemic patients with performance similar to the manual technique. Their application could minimize user interaction and computational time, even when compared with semiautomated approaches.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
HSCT4.0 Application: Software Requirements Specification

NASA Technical Reports Server (NTRS)

Salas, A. O.; Walsh, J. L.; Mason, B. H.; Weston, R. P.; Townsend, J. C.; Samareh, J. A.; Green, L. L.

2001-01-01

The software requirements for the High Performance Computing and Communication Program High Speed Civil Transport application project, referred to as HSCT4.0, are described. The objective of the HSCT4.0 application project is to demonstrate the application of high-performance computing techniques to the problem of multidisciplinary design optimization of a supersonic transport configuration, using high-fidelity analysis simulations. Descriptions of the various functions (and the relationships among them) that make up the multidisciplinary application as well as the constraints on the software design arc provided. This document serves to establish an agreement between the suppliers and the customer as to what the HSCT4.0 application should do and provides to the software developers the information necessary to design and implement the system.
High-Speed Photonic Reservoir Computing Using a Time-Delay-Based Architecture: Million Words per Second Classification

NASA Astrophysics Data System (ADS)

Larger, Laurent; Baylón-Fuentes, Antonio; Martinenghi, Romain; Udaltsov, Vladimir S.; Chembo, Yanne K.; Jacquot, Maxime

2017-01-01

Reservoir computing, originally referred to as an echo state network or a liquid state machine, is a brain-inspired paradigm for processing temporal information. It involves learning a "read-out" interpretation for nonlinear transients developed by high-dimensional dynamics when the latter is excited by the information signal to be processed. This novel computational paradigm is derived from recurrent neural network and machine learning techniques. It has recently been implemented in photonic hardware for a dynamical system, which opens the path to ultrafast brain-inspired computing. We report on a novel implementation involving an electro-optic phase-delay dynamics designed with off-the-shelf optoelectronic telecom devices, thus providing the targeted wide bandwidth. Computational efficiency is demonstrated experimentally with speech-recognition tasks. State-of-the-art speed performances reach one million words per second, with very low word error rate. Additionally, to record speed processing, our investigations have revealed computing-efficiency improvements through yet-unexplored temporal-information-processing techniques, such as simultaneous multisample injection and pitched sampling at the read-out compared to information "write-in".

Models and techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1978-01-01

Progress in the development of system models and techniques for the formulation and evaluation of aircraft computer system effectiveness is reported. Topics covered include: analysis of functional dependence: a prototype software package, METAPHOR, developed to aid the evaluation of performability; and a comprehensive performability modeling and evaluation exercise involving the SIFT computer.
A Comparison of FPGA and GPGPU Designs for Bayesian Occupancy Filters.

PubMed

Medina, Luis; Diez-Ochoa, Miguel; Correal, Raul; Cuenca-Asensi, Sergio; Serrano, Alejandro; Godoy, Jorge; Martínez-Álvarez, Antonio; Villagra, Jorge

2017-11-11

Grid-based perception techniques in the automotive sector based on fusing information from different sensors and their robust perceptions of the environment are proliferating in the industry. However, one of the main drawbacks of these techniques is the traditionally prohibitive, high computing performance that is required for embedded automotive systems. In this work, the capabilities of new computing architectures that embed these algorithms are assessed in a real car. The paper compares two ad hoc optimized designs of the Bayesian Occupancy Filter; one for General Purpose Graphics Processing Unit (GPGPU) and the other for Field-Programmable Gate Array (FPGA). The resulting implementations are compared in terms of development effort, accuracy and performance, using datasets from a realistic simulator and from a real automated vehicle.
Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures.

PubMed

Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M

2016-01-01

The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
Neural simulations on multi-core architectures.

PubMed

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures

PubMed Central

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
Finite Dimensional Approximations for Continuum Multiscale Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berlyand, Leonid

2017-01-24

The completed research project concerns the development of novel computational techniques for modeling nonlinear multiscale physical and biological phenomena. Specifically, it addresses the theoretical development and applications of the homogenization theory (coarse graining) approach to calculation of the effective properties of highly heterogenous biological and bio-inspired materials with many spatial scales and nonlinear behavior. This theory studies properties of strongly heterogeneous media in problems arising in materials science, geoscience, biology, etc. Modeling of such media raises fundamental mathematical questions, primarily in partial differential equations (PDEs) and calculus of variations, the subject of the PI’s research. The focus of completed researchmore » was on mathematical models of biological and bio-inspired materials with the common theme of multiscale analysis and coarse grain computational techniques. Biological and bio-inspired materials offer the unique ability to create environmentally clean functional materials used for energy conversion and storage. These materials are intrinsically complex, with hierarchical organization occurring on many nested length and time scales. The potential to rationally design and tailor the properties of these materials for broad energy applications has been hampered by the lack of computational techniques, which are able to bridge from the molecular to the macroscopic scale. The project addressed the challenge of computational treatments of such complex materials by the development of a synergistic approach that combines innovative multiscale modeling/analysis techniques with high performance computing.« less
Modeling Materials: Design for Planetary Entry, Electric Aircraft, and Beyond

NASA Technical Reports Server (NTRS)

Thompson, Alexander; Lawson, John W.

2014-01-01

NASA missions push the limits of what is possible. The development of high-performance materials must keep pace with the agency's demanding, cutting-edge applications. Researchers at NASA's Ames Research Center are performing multiscale computational modeling to accelerate development times and further the design of next-generation aerospace materials. Multiscale modeling combines several computationally intensive techniques ranging from the atomic level to the macroscale, passing output from one level as input to the next level. These methods are applicable to a wide variety of materials systems. For example: (a) Ultra-high-temperature ceramics for hypersonic aircraft-we utilized the full range of multiscale modeling to characterize thermal protection materials for faster, safer air- and spacecraft, (b) Planetary entry heat shields for space vehicles-we computed thermal and mechanical properties of ablative composites by combining several methods, from atomistic simulations to macroscale computations, (c) Advanced batteries for electric aircraft-we performed large-scale molecular dynamics simulations of advanced electrolytes for ultra-high-energy capacity batteries to enable long-distance electric aircraft service; and (d) Shape-memory alloys for high-efficiency aircraft-we used high-fidelity electronic structure calculations to determine phase diagrams in shape-memory transformations. Advances in high-performance computing have been critical to the development of multiscale materials modeling. We used nearly one million processor hours on NASA's Pleiades supercomputer to characterize electrolytes with a fidelity that would be otherwise impossible. For this and other projects, Pleiades enables us to push the physics and accuracy of our calculations to new levels.
Comparative Analysis Between Computed and Conventional Inferior Alveolar Nerve Block Techniques.

PubMed

Araújo, Gabriela Madeira; Barbalho, Jimmy Charles Melo; Dias, Tasiana Guedes de Souza; Santos, Thiago de Santana; Vasconcellos, Ricardo José de Holanda; de Morais, Hécio Henrique Araújo

2015-11-01

The aim of this randomized, double-blind, controlled trial was to compare the computed and conventional inferior alveolar nerve block techniques in symmetrically positioned inferior third molars. Both computed and conventional anesthetic techniques were performed in 29 healthy patients (58 surgeries) aged between 18 and 40 years. The anesthetic of choice was 2% lidocaine with 1: 200,000 epinephrine. The Visual Analogue Scale assessed the pain variable after anesthetic infiltration. Patient satisfaction was evaluated using the Likert Scale. Heart and respiratory rates, mean time to perform technique, and the need for additional anesthesia were also evaluated. Pain variable means were higher for the conventional technique as compared with computed, 3.45 ± 2.73 and 2.86 ± 1.96, respectively, but no statistically significant differences were found (P > 0.05). Patient satisfaction showed no statistically significant differences. The average computed technique runtime and the conventional were 3.85 and 1.61 minutes, respectively, showing statistically significant differences (P <0.001). The computed anesthetic technique showed lower mean pain perception, but did not show statistically significant differences when contrasted to the conventional technique.
NASA Aeronautics: Research and Technology Program Highlights

NASA Technical Reports Server (NTRS)

1990-01-01

This report contains numerous color illustrations to describe the NASA programs in aeronautics. The basic ideas involved are explained in brief paragraphs. The seven chapters deal with Subsonic aircraft, High-speed transport, High-performance military aircraft, Hypersonic/Transatmospheric vehicles, Critical disciplines, National facilities and Organizations & installations. Some individual aircraft discussed are : the SR-71 aircraft, aerospace planes, the high-speed civil transport (HSCT), the X-29 forward-swept wing research aircraft, and the X-31 aircraft. Critical disciplines discussed are numerical aerodynamic simulation, computational fluid dynamics, computational structural dynamics and new experimental testing techniques.
Computer System Performance Measurement Techniques for ARTS III Computer Systems

DOT National Transportation Integrated Search

1973-12-01

The potential contribution of direct system measurement in the evolving ARTS 3 Program is discussed and software performance measurement techniques are comparatively assessed in terms of credibility of results, ease of implementation, volume of data,...
Integrating dimension reduction and out-of-sample extension in automated classification of ex vivo human patellar cartilage on phase contrast X-ray computed tomography.

PubMed

Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Wismüller, Axel

2015-01-01

Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns.
High performance optical encryption based on computational ghost imaging with QR code and compressive sensing technique

NASA Astrophysics Data System (ADS)

Zhao, Shengmei; Wang, Le; Liang, Wenqiang; Cheng, Weiwen; Gong, Longyan

2015-10-01

In this paper, we propose a high performance optical encryption (OE) scheme based on computational ghost imaging (GI) with QR code and compressive sensing (CS) technique, named QR-CGI-OE scheme. N random phase screens, generated by Alice, is a secret key and be shared with its authorized user, Bob. The information is first encoded by Alice with QR code, and the QR-coded image is then encrypted with the aid of computational ghost imaging optical system. Here, measurement results from the GI optical system's bucket detector are the encrypted information and be transmitted to Bob. With the key, Bob decrypts the encrypted information to obtain the QR-coded image with GI and CS techniques, and further recovers the information by QR decoding. The experimental and numerical simulated results show that the authorized users can recover completely the original image, whereas the eavesdroppers can not acquire any information about the image even the eavesdropping ratio (ER) is up to 60% at the given measurement times. For the proposed scheme, the number of bits sent from Alice to Bob are reduced considerably and the robustness is enhanced significantly. Meantime, the measurement times in GI system is reduced and the quality of the reconstructed QR-coded image is improved.
Radio Synthesis Imaging - A High Performance Computing and Communications Project

NASA Astrophysics Data System (ADS)

Crutcher, Richard M.

The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
Pseudo-point transport technique: a new method for solving the Boltzmann transport equation in media with highly fluctuating cross sections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakhai, B.

A new method for solving radiation transport problems is presented. The heart of the technique is a new cross section processing procedure for the calculation of group-to-point and point-to-group cross sections sets. The method is ideally suited for problems which involve media with highly fluctuating cross sections, where the results of the traditional multigroup calculations are beclouded by the group averaging procedures employed. Extensive computational efforts, which would be required to evaluate double integrals in the multigroup treatment numerically, prohibit iteration to optimize the energy boundaries. On the other hand, use of point-to-point techniques (as in the stochastic technique) ismore » often prohibitively expensive due to the large computer storage requirement. The pseudo-point code is a hybrid of the two aforementioned methods (group-to-group and point-to-point) - hence the name pseudo-point - that reduces the computational efforts of the former and the large core requirements of the latter. The pseudo-point code generates the group-to-point or the point-to-group transfer matrices, and can be coupled with the existing transport codes to calculate pointwise energy-dependent fluxes. This approach yields much more detail than is available from the conventional energy-group treatments. Due to the speed of this code, several iterations could be performed (in affordable computing efforts) to optimize the energy boundaries and the weighting functions. The pseudo-point technique is demonstrated by solving six problems, each depicting a certain aspect of the technique. The results are presented as flux vs energy at various spatial intervals. The sensitivity of the technique to the energy grid and the savings in computational effort are clearly demonstrated.« less
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

PubMed

Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

2010-01-11

Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Computational Prediction of Cryogenic Micro-nano Solid Nitrogen Particle Production Using Laval Nozzle for Physical Photo Resist Removal-cleaning Technology

NASA Astrophysics Data System (ADS)

Ishimoto, Jun; Abe, Haruto; Ochiai, Naoya

The fundamental characteristics of the cryogenic single-component micro-nano solid nitrogen (SN2) particle production using super adiabatic Laval nozzle and its application to the physical photo resist removal-cleaning technology are investigated by a new type of integrated measurement coupled computational technique. As a result of present computation, it is found that high-speed ultra-fine SN2 particles are continuously generated due to the freezing of liquid nitrogen (LN2) droplets induced by rapid adiabatic expansion of transonic subcooled two-phase nitrogen flow passing through the Laval nozzle. Furthermore, the effect of SN2 particle diameter, injection velocity, and attack angle to the wafer substrate on resist removal-cleaning performance is investigated in detail by integrated measurement coupled computational technique.
Nuclear reactor descriptions for space power systems analysis

NASA Technical Reports Server (NTRS)

Mccauley, E. W.; Brown, N. J.

1972-01-01

For the small, high performance reactors required for space electric applications, adequate neutronic analysis is of crucial importance, but in terms of computational time consumed, nuclear calculations probably yield the least amount of detail for mission analysis study. It has been found possible, after generation of only a few designs of a reactor family in elaborate thermomechanical and nuclear detail to use simple curve fitting techniques to assure desired neutronic performance while still performing the thermomechanical analysis in explicit detail. The resulting speed-up in computation time permits a broad detailed examination of constraints by the mission analyst.
Breast tumor segmentation in high resolution x-ray phase contrast analyzer based computed tomography.

PubMed

Brun, E; Grandl, S; Sztrókay-Gaul, A; Barbone, G; Mittone, A; Gasilov, S; Bravin, A; Coan, P

2014-11-01

Phase contrast computed tomography has emerged as an imaging method, which is able to outperform present day clinical mammography in breast tumor visualization while maintaining an equivalent average dose. To this day, no segmentation technique takes into account the specificity of the phase contrast signal. In this study, the authors propose a new mathematical framework for human-guided breast tumor segmentation. This method has been applied to high-resolution images of excised human organs, each of several gigabytes. The authors present a segmentation procedure based on the viscous watershed transform and demonstrate the efficacy of this method on analyzer based phase contrast images. The segmentation of tumors inside two full human breasts is then shown as an example of this procedure's possible applications. A correct and precise identification of the tumor boundaries was obtained and confirmed by manual contouring performed independently by four experienced radiologists. The authors demonstrate that applying the watershed viscous transform allows them to perform the segmentation of tumors in high-resolution x-ray analyzer based phase contrast breast computed tomography images. Combining the additional information provided by the segmentation procedure with the already high definition of morphological details and tissue boundaries offered by phase contrast imaging techniques, will represent a valuable multistep procedure to be used in future medical diagnostic applications.
Architectural Techniques For Managing Non-volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.

PubMed

Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T

2017-01-01

Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.

State-of-the-art in Heterogeneous Computing

DOE PAGES

Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...

2010-01-01

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less
Users matter : multi-agent systems model of high performance computing cluster users.

DOE Office of Scientific and Technical Information (OSTI.GOV)

North, M. J.; Hood, C. S.; Decision and Information Sciences

2005-01-01

High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex duemore » to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.« less
A High-Performance Cellular Automaton Model of Tumor Growth with Dynamically Growing Domains

PubMed Central

Poleszczuk, Jan; Enderling, Heiko

2014-01-01

Tumor growth from a single transformed cancer cell up to a clinically apparent mass spans many spatial and temporal orders of magnitude. Implementation of cellular automata simulations of such tumor growth can be straightforward but computing performance often counterbalances simplicity. Computationally convenient simulation times can be achieved by choosing appropriate data structures, memory and cell handling as well as domain setup. We propose a cellular automaton model of tumor growth with a domain that expands dynamically as the tumor population increases. We discuss memory access, data structures and implementation techniques that yield high-performance multi-scale Monte Carlo simulations of tumor growth. We discuss tumor properties that favor the proposed high-performance design and present simulation results of the tumor growth model. We estimate to which parameters the model is the most sensitive, and show that tumor volume depends on a number of parameters in a non-monotonic manner. PMID:25346862
Advanced Architectures for Astrophysical Supercomputing

NASA Astrophysics Data System (ADS)

Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

2010-12-01

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
Reducing Wrong Patient Selection Errors: Exploring the Design Space of User Interface Techniques

PubMed Central

Sopan, Awalin; Plaisant, Catherine; Powsner, Seth; Shneiderman, Ben

2014-01-01

Wrong patient selection errors are a major issue for patient safety; from ordering medication to performing surgery, the stakes are high. Widespread adoption of Electronic Health Record (EHR) and Computerized Provider Order Entry (CPOE) systems makes patient selection using a computer screen a frequent task for clinicians. Careful design of the user interface can help mitigate the problem by helping providers recall their patients’ identities, accurately select their names, and spot errors before orders are submitted. We propose a catalog of twenty seven distinct user interface techniques, organized according to a task analysis. An associated video demonstrates eighteen of those techniques. EHR designers who consider a wider range of human-computer interaction techniques could reduce selection errors, but verification of efficacy is still needed. PMID:25954415
Reducing wrong patient selection errors: exploring the design space of user interface techniques.

PubMed

Sopan, Awalin; Plaisant, Catherine; Powsner, Seth; Shneiderman, Ben

2014-01-01

Wrong patient selection errors are a major issue for patient safety; from ordering medication to performing surgery, the stakes are high. Widespread adoption of Electronic Health Record (EHR) and Computerized Provider Order Entry (CPOE) systems makes patient selection using a computer screen a frequent task for clinicians. Careful design of the user interface can help mitigate the problem by helping providers recall their patients' identities, accurately select their names, and spot errors before orders are submitted. We propose a catalog of twenty seven distinct user interface techniques, organized according to a task analysis. An associated video demonstrates eighteen of those techniques. EHR designers who consider a wider range of human-computer interaction techniques could reduce selection errors, but verification of efficacy is still needed.
Precise and fast spatial-frequency analysis using the iterative local Fourier transform.

PubMed

Lee, Sukmock; Choi, Heejoo; Kim, Dae Wook

2016-09-19

The use of the discrete Fourier transform has decreased since the introduction of the fast Fourier transform (fFT), which is a numerically efficient computing process. This paper presents the iterative local Fourier transform (ilFT), a set of new processing algorithms that iteratively apply the discrete Fourier transform within a local and optimal frequency domain. The new technique achieves 2¹⁰ times higher frequency resolution than the fFT within a comparable computation time. The method's superb computing efficiency, high resolution, spectrum zoom-in capability, and overall performance are evaluated and compared to other advanced high-resolution Fourier transform techniques, such as the fFT combined with several fitting methods. The effectiveness of the ilFT is demonstrated through the data analysis of a set of Talbot self-images (1280 × 1024 pixels) obtained with an experimental setup using grating in a diverging beam produced by a coherent point source.
Raster Scan Computer Image Generation (CIG) System Based On Refresh Memory

NASA Astrophysics Data System (ADS)

Dichter, W.; Doris, K.; Conkling, C.

1982-06-01

A full color, Computer Image Generation (CIG) raster visual system has been developed which provides a high level of training sophistication by utilizing advanced semiconductor technology and innovative hardware and firmware techniques. Double buffered refresh memory and efficient algorithms eliminate the problem of conventional raster line ordering by allowing the generated image to be stored in a random fashion. Modular design techniques and simplified architecture provide significant advantages in reduced system cost, standardization of parts, and high reliability. The major system components are a general purpose computer to perform interfacing and data base functions; a geometric processor to define the instantaneous scene image; a display generator to convert the image to a video signal; an illumination control unit which provides final image processing; and a CRT monitor for display of the completed image. Additional optional enhancements include texture generators, increased edge and occultation capability, curved surface shading, and data base extensions.
Monitoring of experimental rat lung transplants by high-resolution flat-panel volumetric computer tomography (fpVCT).

PubMed

Greschus, Susanne; Kuchenbuch, Tim; Plötz, Christian; Obert, Martin; Traupe, Horst; Padberg, Winfried; Grau, Veronika; Hirschburger, Markus

2009-01-01

Noninvasive assessment of experimental lung transplants with high resolution would be favorable to exclude technical failure and to follow up graft outcome in the living animal. Here we describe a flat-panel Volumetric Computed Tomography (fpVCT) technique using a prototype scanner. Lung transplantation was performed in allogeneic as well as in corresponding syngeneic rat strain combinations. At different time points post-transplantation, fpVCT was performed. Lung transplants can be visualized in the living rat with high-spatial resolution. FpVCT allows a detailed analysis of the lung and the bronchi. Infiltrates developing during rejection episodes can be diagnosed and follow-up studies can easily be performed. With fpVCT it is possible to control the technical success of the surgical procedure. Graft rejection can be visualized individually in the living animal noninvasively, which is highly advantageous for studying the pathogenesis of chronic rejection or to monitor new therapies.
The Research and Implementation of MUSER CLEAN Algorithm Based on OpenCL

NASA Astrophysics Data System (ADS)

Feng, Y.; Chen, K.; Deng, H.; Wang, F.; Mei, Y.; Wei, S. L.; Dai, W.; Yang, Q. P.; Liu, Y. B.; Wu, J. P.

2017-03-01

It's urgent to carry out high-performance data processing with a single machine in the development of astronomical software. However, due to the different configuration of the machine, traditional programming techniques such as multi-threading, and CUDA (Compute Unified Device Architecture)+GPU (Graphic Processing Unit) have obvious limitations in portability and seamlessness between different operation systems. The OpenCL (Open Computing Language) used in the development of MUSER (MingantU SpEctral Radioheliograph) data processing system is introduced. And the Högbom CLEAN algorithm is re-implemented into parallel CLEAN algorithm by the Python language and PyOpenCL extended package. The experimental results show that the CLEAN algorithm based on OpenCL has approximately equally operating efficiency compared with the former CLEAN algorithm based on CUDA. More important, the data processing in merely CPU (Central Processing Unit) environment of this system can also achieve high performance, which has solved the problem of environmental dependence of CUDA+GPU. Overall, the research improves the adaptability of the system with emphasis on performance of MUSER image clean computing. In the meanwhile, the realization of OpenCL in MUSER proves its availability in scientific data processing. In view of the high-performance computing features of OpenCL in heterogeneous environment, it will probably become the preferred technology in the future high-performance astronomical software development.
High capacity hydrogen storage materials: attributes for automotive applications and techniques for materials discovery.

PubMed

Yang, Jun; Sudik, Andrea; Wolverton, Christopher; Siegel, Donald J

2010-02-01

Widespread adoption of hydrogen as a vehicular fuel depends critically upon the ability to store hydrogen on-board at high volumetric and gravimetric densities, as well as on the ability to extract/insert it at sufficiently rapid rates. As current storage methods based on physical means--high-pressure gas or (cryogenic) liquefaction--are unlikely to satisfy targets for performance and cost, a global research effort focusing on the development of chemical means for storing hydrogen in condensed phases has recently emerged. At present, no known material exhibits a combination of properties that would enable high-volume automotive applications. Thus new materials with improved performance, or new approaches to the synthesis and/or processing of existing materials, are highly desirable. In this critical review we provide a practical introduction to the field of hydrogen storage materials research, with an emphasis on (i) the properties necessary for a viable storage material, (ii) the computational and experimental techniques commonly employed in determining these attributes, and (iii) the classes of materials being pursued as candidate storage compounds. Starting from the general requirements of a fuel cell vehicle, we summarize how these requirements translate into desired characteristics for the hydrogen storage material. Key amongst these are: (a) high gravimetric and volumetric hydrogen density, (b) thermodynamics that allow for reversible hydrogen uptake/release under near-ambient conditions, and (c) fast reaction kinetics. To further illustrate these attributes, the four major classes of candidate storage materials--conventional metal hydrides, chemical hydrides, complex hydrides, and sorbent systems--are introduced and their respective performance and prospects for improvement in each of these areas is discussed. Finally, we review the most valuable experimental and computational techniques for determining these attributes, highlighting how an approach that couples computational modeling with experiments can significantly accelerate the discovery of novel storage materials (155 references).
Statistical iterative reconstruction to improve image quality for digital breast tomosynthesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Shiyu, E-mail: shiyu.xu@gmail.com; Chen, Ying, E-mail: adachen@siu.edu; Lu, Jianping

2015-09-15

Purpose: Digital breast tomosynthesis (DBT) is a novel modality with the potential to improve early detection of breast cancer by providing three-dimensional (3D) imaging with a low radiation dose. 3D image reconstruction presents some challenges: cone-beam and flat-panel geometry, and highly incomplete sampling. A promising means to overcome these challenges is statistical iterative reconstruction (IR), since it provides the flexibility of accurate physics modeling and a general description of system geometry. The authors’ goal was to develop techniques for applying statistical IR to tomosynthesis imaging data. Methods: These techniques include the following: a physics model with a local voxel-pair basedmore » prior with flexible parameters to fine-tune image quality; a precomputed parameter λ in the prior, to remove data dependence and to achieve a uniform resolution property; an effective ray-driven technique to compute the forward and backprojection; and an oversampled, ray-driven method to perform high resolution reconstruction with a practical region-of-interest technique. To assess the performance of these techniques, the authors acquired phantom data on the stationary DBT prototype system. To solve the estimation problem, the authors proposed an optimization-transfer based algorithm framework that potentially allows fewer iterations to achieve an acceptably converged reconstruction. Results: IR improved the detectability of low-contrast and small microcalcifications, reduced cross-plane artifacts, improved spatial resolution, and lowered noise in reconstructed images. Conclusions: Although the computational load remains a significant challenge for practical development, the superior image quality provided by statistical IR, combined with advancing computational techniques, may bring benefits to screening, diagnostics, and intraoperative imaging in clinical applications.« less
Virtopsy: postmortem imaging of laryngeal foreign bodies.

PubMed

Oesterhelweg, Lars; Bolliger, Stephan A; Thali, Michael J; Ross, Steffen

2009-05-01

Death from corpora aliena in the larynx is a well-known entity in forensic pathology. The correct diagnosis of this cause of death is difficult without an autopsy, and misdiagnoses by external examination alone are common. To determine the postmortem usefulness of modern imaging techniques in the diagnosis of foreign bodies in the larynx, multislice computed tomography, magnetic resonance imaging, and postmortem full-body computed tomography-angiography were performed. Three decedents with a suspected foreign body in the larynx underwent the 3 different imaging techniques before medicolegal autopsy. Multislice computed tomography has a high diagnostic value in the noninvasive localization of a foreign body and abnormalities in the larynx. The differentiation between neoplasm or soft foreign bodies (eg, food) is possible, but difficult, by unenhanced multislice computed tomography. By magnetic resonance imaging, the discrimination of the soft tissue structures and soft foreign bodies is much easier. In addition to the postmortem multislice computed tomography, the combination with postmortem angiography will increase the diagnostic value. Postmortem, cross-sectional imaging methods are highly valuable procedures for the noninvasive detection of corpora aliena in the larynx.
REVEAL: An Extensible Reduced Order Model Builder for Simulation and Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, Khushbu; Sharma, Poorva; Ma, Jinliang

2013-04-30

Many science domains need to build computationally efficient and accurate representations of high fidelity, computationally expensive simulations. These computationally efficient versions are known as reduced-order models. This paper presents the design and implementation of a novel reduced-order model (ROM) builder, the REVEAL toolset. This toolset generates ROMs based on science- and engineering-domain specific simulations executed on high performance computing (HPC) platforms. The toolset encompasses a range of sampling and regression methods that can be used to generate a ROM, automatically quantifies the ROM accuracy, and provides support for an iterative approach to improve ROM accuracy. REVEAL is designed to bemore » extensible in order to utilize the core functionality with any simulator that has published input and output formats. It also defines programmatic interfaces to include new sampling and regression techniques so that users can ‘mix and match’ mathematical techniques to best suit the characteristics of their model. In this paper, we describe the architecture of REVEAL and demonstrate its usage with a computational fluid dynamics model used in carbon capture.« less
Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

NASA Astrophysics Data System (ADS)

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Computational adaptive optics for broadband optical interferometric tomography of biological tissue

NASA Astrophysics Data System (ADS)

Boppart, Stephen A.

2015-03-01

High-resolution real-time tomography of biological tissues is important for many areas of biological investigations and medical applications. Cellular level optical tomography, however, has been challenging because of the compromise between transverse imaging resolution and depth-of-field, the system and sample aberrations that may be present, and the low imaging sensitivity deep in scattering tissues. The use of computed optical imaging techniques has the potential to address several of these long-standing limitations and challenges. Two related techniques are interferometric synthetic aperture microscopy (ISAM) and computational adaptive optics (CAO). Through three-dimensional Fourierdomain resampling, in combination with high-speed OCT, ISAM can be used to achieve high-resolution in vivo tomography with enhanced depth sensitivity over a depth-of-field extended by more than an order-of-magnitude, in realtime. Subsequently, aberration correction with CAO can be performed in a tomogram, rather than to the optical beam of a broadband optical interferometry system. Based on principles of Fourier optics, aberration correction with CAO is performed on a virtual pupil using Zernike polynomials, offering the potential to augment or even replace the more complicated and expensive adaptive optics hardware with algorithms implemented on a standard desktop computer. Interferometric tomographic reconstructions are characterized with tissue phantoms containing sub-resolution scattering particles, and in both ex vivo and in vivo biological tissue. This review will collectively establish the foundation for high-speed volumetric cellular-level optical interferometric tomography in living tissues.
Computed tomography and magnetic resonance imaging in diagnosing hepatocellular carcinoma.

PubMed

Dalla Palma, L; Pozzi-Mucelli, R S

1992-02-01

The evaluation of hepatocellular carcinoma (HCC) is based upon ultrasonography (US) which has proved to have a high sensitivity and is also extremely useful in guiding the percutaneous needle biopsy. The main role of computed tomography (CT) and magnetic resonance imaging (MRI) is to supplement US in evaluating the extent of HCC. The Authors discuss the different techniques of examinations of the liver both for CT and MRI as far as the modalities of contrast enhancement, site of injection, and type of contrast agents are concerned. The differences between low field and high field magnets are also discussed. The main CT and MRI findings are illustrated, depending upon the technique of examination. Finally the role of these techniques is discussed. Based upon personal experience and the data in CT literature, and if performed with updated technology and intraarterial injection (lipiodol), CT is the method of choice in order to supplement US in the evaluation of HCC.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques

DOT National Transportation Integrated Search

1995-01-01

Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...
On the Impact of Execution Models: A Case Study in Computational Chemistry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Halappanavar, Mahantesh; Krishnamoorthy, Sriram

2015-05-25

Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancingmore » that has comparable performance to a traditional hypergraph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.« less
Quantum rendering

NASA Astrophysics Data System (ADS)

Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

2003-08-01

In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.

Development of a computer technique for the prediction of transport aircraft flight profile sonic boom signatures. M.S. Thesis

NASA Technical Reports Server (NTRS)

Coen, Peter G.

1991-01-01

A new computer technique for the analysis of transport aircraft sonic boom signature characteristics was developed. This new technique, based on linear theory methods, combines the previously separate equivalent area and F function development with a signature propagation method using a single geometry description. The new technique was implemented in a stand-alone computer program and was incorporated into an aircraft performance analysis program. Through these implementations, both configuration designers and performance analysts are given new capabilities to rapidly analyze an aircraft's sonic boom characteristics throughout the flight envelope.
Development of a computational model for astronaut reorientation.

PubMed

Stirling, Leia; Willcox, Karen; Newman, Dava

2010-08-26

The ability to model astronaut reorientations computationally provides a simple way to develop and study human motion control strategies. Since the cost of experimenting in microgravity is high, and underwater training can lead to motions inappropriate for microgravity, these techniques allow for motions to be developed and well-understood prior to any microgravity exposure. By including a model of the current space suit, we have the ability to study both intravehicular and extravehicular activities. We present several techniques for rotating about the axes of the body and show that motions performed by the legs create a greater net rotation than those performed by the arms. Adding a space suit to the motions was seen to increase the resistance torque and limit the available range of motion. While rotations about the body axes can be performed in the current space suit, the resulting motions generated a reduced rotation when compared to the unsuited configuration. 2010 Elsevier Ltd. All rights reserved.
A Comparison of FPGA and GPGPU Designs for Bayesian Occupancy Filters

PubMed Central

Medina, Luis; Diez-Ochoa, Miguel; Correal, Raul; Cuenca-Asensi, Sergio; Godoy, Jorge; Martínez-Álvarez, Antonio

2017-01-01

Grid-based perception techniques in the automotive sector based on fusing information from different sensors and their robust perceptions of the environment are proliferating in the industry. However, one of the main drawbacks of these techniques is the traditionally prohibitive, high computing performance that is required for embedded automotive systems. In this work, the capabilities of new computing architectures that embed these algorithms are assessed in a real car. The paper compares two ad hoc optimized designs of the Bayesian Occupancy Filter; one for General Purpose Graphics Processing Unit (GPGPU) and the other for Field-Programmable Gate Array (FPGA). The resulting implementations are compared in terms of development effort, accuracy and performance, using datasets from a realistic simulator and from a real automated vehicle. PMID:29137137
Turbocharged molecular discovery of OLED emitters: from high-throughput quantum simulation to highly efficient TADF devices

NASA Astrophysics Data System (ADS)

Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Ha, Dong-Gwang; Einzinger, Markus; Wu, Tony; Baldo, Marc A.; Aspuru-Guzik, Alán.

2016-09-01

Discovering new OLED emitters requires many experiments to synthesize candidates and test performance in devices. Large scale computer simulation can greatly speed this search process but the problem remains challenging enough that brute force application of massive computing power is not enough to successfully identify novel structures. We report a successful High Throughput Virtual Screening study that leveraged a range of methods to optimize the search process. The generation of candidate structures was constrained to contain combinatorial explosion. Simulations were tuned to the specific problem and calibrated with experimental results. Experimentalists and theorists actively collaborated such that experimental feedback was regularly utilized to update and shape the computational search. Supervised machine learning methods prioritized candidate structures prior to quantum chemistry simulation to prevent wasting compute on likely poor performers. With this combination of techniques, each multiplying the strength of the search, this effort managed to navigate an area of molecular space and identify hundreds of promising OLED candidate structures. An experimentally validated selection of this set shows emitters with external quantum efficiencies as high as 22%.
Overview of High-Fidelity Modeling Activities in the Numerical Propulsion System Simulations (NPSS) Project

NASA Technical Reports Server (NTRS)

Veres, Joseph P.

2002-01-01

A high-fidelity simulation of a commercial turbofan engine has been created as part of the Numerical Propulsion System Simulation Project. The high-fidelity computer simulation utilizes computer models that were developed at NASA Glenn Research Center in cooperation with turbofan engine manufacturers. The average-passage (APNASA) Navier-Stokes based viscous flow computer code is used to simulate the 3D flow in the compressors and turbines of the advanced commercial turbofan engine. The 3D National Combustion Code (NCC) is used to simulate the flow and chemistry in the advanced aircraft combustor. The APNASA turbomachinery code and the NCC combustor code exchange boundary conditions at the interface planes at the combustor inlet and exit. This computer simulation technique can evaluate engine performance at steady operating conditions. The 3D flow models provide detailed knowledge of the airflow within the fan and compressor, the high and low pressure turbines, and the flow and chemistry within the combustor. The models simulate the performance of the engine at operating conditions that include sea level takeoff and the altitude cruise condition.
Silicon photonics for high-performance interconnection networks

NASA Astrophysics Data System (ADS)

Biberman, Aleksandr

2011-12-01

We assert in the course of this work that silicon photonics has the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems, and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. This work showcases that chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, enable unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of this work, we demonstrate such feasibility of waveguides, modulators, switches, and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. Furthermore, we leverage the unique properties of available silicon photonic materials to create novel silicon photonic devices, subsystems, network topologies, and architectures to enable unprecedented performance of these photonic interconnection networks and computing systems. We show that the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. Furthermore, we explore the immense potential of all-optical functionalities implemented using parametric processing in the silicon platform, demonstrating unique methods that have the ability to revolutionize computation and communication. Silicon photonics enables new sets of opportunities that we can leverage for performance gains, as well as new sets of challenges that we must solve. Leveraging its inherent compatibility with standard fabrication techniques of the semiconductor industry, combined with its capability of dense integration with advanced microelectronics, silicon photonics also offers a clear path toward commercialization through low-cost mass-volume production. Combining empirical validations of feasibility, demonstrations of massive performance gains in large-scale systems, and the potential for commercial penetration of silicon photonics, the impact of this work will become evident in the many decades that follow.
Roads towards fault-tolerant universal quantum computation

NASA Astrophysics Data System (ADS)

Campbell, Earl T.; Terhal, Barbara M.; Vuillot, Christophe

2017-09-01

A practical quantum computer must not merely store information, but also process it. To prevent errors introduced by noise from multiplying and spreading, a fault-tolerant computational architecture is required. Current experiments are taking the first steps toward noise-resilient logical qubits. But to convert these quantum devices from memories to processors, it is necessary to specify how a universal set of gates is performed on them. The leading proposals for doing so, such as magic-state distillation and colour-code techniques, have high resource demands. Alternative schemes, such as those that use high-dimensional quantum codes in a modular architecture, have potential benefits, but need to be explored further.
Roads towards fault-tolerant universal quantum computation.

PubMed

Campbell, Earl T; Terhal, Barbara M; Vuillot, Christophe

2017-09-13

A practical quantum computer must not merely store information, but also process it. To prevent errors introduced by noise from multiplying and spreading, a fault-tolerant computational architecture is required. Current experiments are taking the first steps toward noise-resilient logical qubits. But to convert these quantum devices from memories to processors, it is necessary to specify how a universal set of gates is performed on them. The leading proposals for doing so, such as magic-state distillation and colour-code techniques, have high resource demands. Alternative schemes, such as those that use high-dimensional quantum codes in a modular architecture, have potential benefits, but need to be explored further.
Fuzzy logic, neural networks, and soft computing

NASA Technical Reports Server (NTRS)

Zadeh, Lofti A.

1994-01-01

The past few years have witnessed a rapid growth of interest in a cluster of modes of modeling and computation which may be described collectively as soft computing. The distinguishing characteristic of soft computing is that its primary aims are to achieve tractability, robustness, low cost, and high MIQ (machine intelligence quotient) through an exploitation of the tolerance for imprecision and uncertainty. Thus, in soft computing what is usually sought is an approximate solution to a precisely formulated problem or, more typically, an approximate solution to an imprecisely formulated problem. A simple case in point is the problem of parking a car. Generally, humans can park a car rather easily because the final position of the car is not specified exactly. If it were specified to within, say, a few millimeters and a fraction of a degree, it would take hours or days of maneuvering and precise measurements of distance and angular position to solve the problem. What this simple example points to is the fact that, in general, high precision carries a high cost. The challenge, then, is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. By its nature, soft computing is much closer to human reasoning than the traditional modes of computation. At this juncture, the major components of soft computing are fuzzy logic (FL), neural network theory (NN), and probabilistic reasoning techniques (PR), including genetic algorithms, chaos theory, and part of learning theory. Increasingly, these techniques are used in combination to achieve significant improvement in performance and adaptability. Among the important application areas for soft computing are control systems, expert systems, data compression techniques, image processing, and decision support systems. It may be argued that it is soft computing, rather than the traditional hard computing, that should be viewed as the foundation for artificial intelligence. In the years ahead, this may well become a widely held position.
A Modular Environment for Geophysical Inversion and Run-time Autotuning using Heterogeneous Computing Systems

NASA Astrophysics Data System (ADS)

Myre, Joseph M.

Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Coarse-to-fine markerless gait analysis based on PCA and Gauss-Laguerre decomposition

NASA Astrophysics Data System (ADS)

Goffredo, Michela; Schmid, Maurizio; Conforto, Silvia; Carli, Marco; Neri, Alessandro; D'Alessio, Tommaso

2005-04-01

Human movement analysis is generally performed through the utilization of marker-based systems, which allow reconstructing, with high levels of accuracy, the trajectories of markers allocated on specific points of the human body. Marker based systems, however, show some drawbacks that can be overcome by the use of video systems applying markerless techniques. In this paper, a specifically designed computer vision technique for the detection and tracking of relevant body points is presented. It is based on the Gauss-Laguerre Decomposition, and a Principal Component Analysis Technique (PCA) is used to circumscribe the region of interest. Results obtained on both synthetic and experimental tests provide significant reduction of the computational costs, with no significant reduction of the tracking accuracy.
A proposed framework on hybrid feature selection techniques for handling high dimensional educational data

NASA Astrophysics Data System (ADS)

Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd

2017-10-01

Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.
Brute force meets Bruno force in parameter optimisation: introduction of novel constraints for parameter accuracy improvement by symbolic computation.

PubMed

Nakatsui, M; Horimoto, K; Lemaire, F; Ürgüplü, A; Sedoglavic, A; Boulier, F

2011-09-01

Recent remarkable advances in computer performance have enabled us to estimate parameter values by the huge power of numerical computation, the so-called 'Brute force', resulting in the high-speed simultaneous estimation of a large number of parameter values. However, these advancements have not been fully utilised to improve the accuracy of parameter estimation. Here the authors review a novel method for parameter estimation using symbolic computation power, 'Bruno force', named after Bruno Buchberger, who found the Gröbner base. In the method, the objective functions combining the symbolic computation techniques are formulated. First, the authors utilise a symbolic computation technique, differential elimination, which symbolically reduces an equivalent system of differential equations to a system in a given model. Second, since its equivalent system is frequently composed of large equations, the system is further simplified by another symbolic computation. The performance of the authors' method for parameter accuracy improvement is illustrated by two representative models in biology, a simple cascade model and a negative feedback model in comparison with the previous numerical methods. Finally, the limits and extensions of the authors' method are discussed, in terms of the possible power of 'Bruno force' for the development of a new horizon in parameter estimation.
A vectorized Lanczos eigensolver for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1990-01-01

The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
Integrating Dimension Reduction and Out-of-Sample Extension in Automated Classification of Ex Vivo Human Patellar Cartilage on Phase Contrast X-Ray Computed Tomography

PubMed Central

Nagarajan, Mahesh B.; Coan, Paola; Huber, Markus B.; Diemoz, Paul C.; Wismüller, Axel

2015-01-01

Phase contrast X-ray computed tomography (PCI-CT) has been demonstrated as a novel imaging technique that can visualize human cartilage with high spatial resolution and soft tissue contrast. Different textural approaches have been previously investigated for characterizing chondrocyte organization on PCI-CT to enable classification of healthy and osteoarthritic cartilage. However, the large size of feature sets extracted in such studies motivates an investigation into algorithmic feature reduction for computing efficient feature representations without compromising their discriminatory power. For this purpose, geometrical feature sets derived from the scaling index method (SIM) were extracted from 1392 volumes of interest (VOI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. The extracted feature sets were subject to linear and non-linear dimension reduction techniques as well as feature selection based on evaluation of mutual information criteria. The reduced feature set was subsequently used in a machine learning task with support vector regression to classify VOIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Our results show that the classification performance achieved by 9-D SIM-derived geometric feature sets (AUC: 0.96 ± 0.02) can be maintained with 2-D representations computed from both dimension reduction and feature selection (AUC values as high as 0.97 ± 0.02). Thus, such feature reduction techniques can offer a high degree of compaction to large feature sets extracted from PCI-CT images while maintaining their ability to characterize the underlying chondrocyte patterns. PMID:25710875
A Survey Of Techniques for Managing and Leveraging Caches in GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

2014-09-01

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges ofmore » cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.« less
Solid oxide fuel cell simulation and design optimization with numerical adjoint techniques

NASA Astrophysics Data System (ADS)

Elliott, Louie C.

This dissertation reports on the application of numerical optimization techniques as applied to fuel cell simulation and design. Due to the "multi-physics" inherent in a fuel cell, which results in a highly coupled and non-linear behavior, an experimental program to analyze and improve the performance of fuel cells is extremely difficult. This program applies new optimization techniques with computational methods from the field of aerospace engineering to the fuel cell design problem. After an overview of fuel cell history, importance, and classification, a mathematical model of solid oxide fuel cells (SOFC) is presented. The governing equations are discretized and solved with computational fluid dynamics (CFD) techniques including unstructured meshes, non-linear solution methods, numerical derivatives with complex variables, and sensitivity analysis with adjoint methods. Following the validation of the fuel cell model in 2-D and 3-D, the results of the sensitivity analysis are presented. The sensitivity derivative for a cost function with respect to a design variable is found with three increasingly sophisticated techniques: finite difference, direct differentiation, and adjoint. A design cycle is performed using a simple optimization method to improve the value of the implemented cost function. The results from this program could improve fuel cell performance and lessen the world's dependence on fossil fuels.
Operational forecasting with the subgrid technique on the Elbe Estuary

NASA Astrophysics Data System (ADS)

Sehili, Aissa

2017-04-01

Modern remote sensing technologies can deliver very detailed land surface height data that should be considered for more accurate simulations. In that case, and even if some compromise is made with regard to grid resolution of an unstructured grid, simulations still will require large grids which can be computationally very demanding. The subgrid technique, first published by Casulli (2009), is based on the idea of making use of the available detailed subgrid bathymetric information while performing computations on relatively coarse grids permitting large time steps. Consequently, accuracy and efficiency are drastically enhanced if compared to the classical linear method, where the underlying bathymetry is solely discretized by the computational grid. The algorithm guarantees rigorous mass conservation and nonnegative water depths for any time step size. Computational grid-cells are permitted to be wet, partially wet or dry and no drying threshold is needed. The subgrid technique is used in an operational forecast model for water level, current velocity, salinity and temperature of the Elbe estuary in Germany. Comparison is performed with the comparatively highly resolved classical unstructured grid model UnTRIM. The daily meteorological forcing data are delivered by the German Weather Service (DWD) using the ICON-EU model. Open boundary data are delivered by the coastal model BSHcmod of the German Federal Maritime and Hydrographic Agency (BSH). Comparison of predicted water levels between classical and subgrid model shows a very good agreement. The speedup in computational performance due to the use of the subgrid technique is about a factor of 20. A typical daily forecast can be carried out within less than 10 minutes on standard PC-like hardware. The model is capable of permanently delivering highly resolved temporal and spatial information on water level, current velocity, salinity and temperature for the whole estuary. The model offers also the possibility to recalculate any previous situation. This can be helpful to figure out for instance the context in which a certain event occurred like an accident. In addition to measurement, the model can be used to improve navigability by adjusting the tidal transit-schedule for container vessels that are depending on the tide to approach or leave the port of Hamburg.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

PubMed Central

2010-01-01

Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
GPUs benchmarking in subpixel image registration algorithm

NASA Astrophysics Data System (ADS)

Sanz-Sabater, Martin; Picazo-Bueno, Jose Angel; Micó, Vicente; Ferrerira, Carlos; Granero, Luis; Garcia, Javier

2015-05-01

Image registration techniques are used among different scientific fields, like medical imaging or optical metrology. The straightest way to calculate shifting between two images is using the cross correlation, taking the highest value of this correlation image. Shifting resolution is given in whole pixels which cannot be enough for certain applications. Better results can be achieved interpolating both images, as much as the desired resolution we want to get, and applying the same technique described before, but the memory needed by the system is significantly higher. To avoid memory consuming we are implementing a subpixel shifting method based on FFT. With the original images, subpixel shifting can be achieved multiplying its discrete Fourier transform by a linear phase with different slopes. This method is high time consuming method because checking a concrete shifting means new calculations. The algorithm, highly parallelizable, is very suitable for high performance computing systems. GPU (Graphics Processing Unit) accelerated computing became very popular more than ten years ago because they have hundreds of computational cores in a reasonable cheap card. In our case, we are going to register the shifting between two images, doing the first approach by FFT based correlation, and later doing the subpixel approach using the technique described before. We consider it as `brute force' method. So we will present a benchmark of the algorithm consisting on a first approach (pixel resolution) and then do subpixel resolution approaching, decreasing the shifting step in every loop achieving a high resolution in few steps. This program will be executed in three different computers. At the end, we will present the results of the computation, with different kind of CPUs and GPUs, checking the accuracy of the method, and the time consumed in each computer, discussing the advantages, disadvantages of the use of GPUs.

A Scalable, Parallel Approach for Multi-Point, High-Fidelity Aerostructural Optimization of Aircraft Configurations

NASA Astrophysics Data System (ADS)

Kenway, Gaetan K. W.

This thesis presents new tools and techniques developed to address the challenging problem of high-fidelity aerostructural optimization with respect to large numbers of design variables. A new mesh-movement scheme is developed that is both computationally efficient and sufficiently robust to accommodate large geometric design changes and aerostructural deformations. A fully coupled Newton-Krylov method is presented that accelerates the convergence of aerostructural systems and provides a 20% performance improvement over the traditional nonlinear block Gauss-Seidel approach and can handle more exible structures. A coupled adjoint method is used that efficiently computes derivatives for a gradient-based optimization algorithm. The implementation uses only machine accurate derivative techniques and is verified to yield fully consistent derivatives by comparing against the complex step method. The fully-coupled large-scale coupled adjoint solution method is shown to have 30% better performance than the segregated approach. The parallel scalability of the coupled adjoint technique is demonstrated on an Euler Computational Fluid Dynamics (CFD) model with more than 80 million state variables coupled to a detailed structural finite-element model of the wing with more than 1 million degrees of freedom. Multi-point high-fidelity aerostructural optimizations of a long-range wide-body, transonic transport aircraft configuration are performed using the developed techniques. The aerostructural analysis employs Euler CFD with a 2 million cell mesh and a structural finite element model with 300 000 DOF. Two design optimization problems are solved: one where takeoff gross weight is minimized, and another where fuel burn is minimized. Each optimization uses a multi-point formulation with 5 cruise conditions and 2 maneuver conditions. The optimization problems have 476 design variables are optimal results are obtained within 36 hours of wall time using 435 processors. The TOGW minimization results in a 4.2% reduction in TOGW with a 6.6% fuel burn reduction, while the fuel burn optimization resulted in a 11.2% fuel burn reduction with no change to the takeoff gross weight.
Computer program uses Monte Carlo techniques for statistical system performance analysis

NASA Technical Reports Server (NTRS)

Wohl, D. P.

1967-01-01

Computer program with Monte Carlo sampling techniques determines the effect of a component part of a unit upon the overall system performance. It utilizes the full statistics of the disturbances and misalignments of each component to provide unbiased results through simulated random sampling.
Speeding Up Ecological and Evolutionary Computations in R; Essentials of High Performance Computing for Biologists

PubMed Central

Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke

2015-01-01

Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842
AutoCNet: A Python library for sparse multi-image correspondence identification for planetary data

NASA Astrophysics Data System (ADS)

Laura, Jason; Rodriguez, Kelvin; Paquette, Adam C.; Dunn, Evin

2018-01-01

In this work we describe the AutoCNet library, written in Python, to support the application of computer vision techniques for n-image correspondence identification in remotely sensed planetary images and subsequent bundle adjustment. The library is designed to support exploratory data analysis, algorithm and processing pipeline development, and application at scale in High Performance Computing (HPC) environments for processing large data sets and generating foundational data products. We also present a brief case study illustrating high level usage for the Apollo 15 Metric camera.
Particle-mesh techniques

NASA Technical Reports Server (NTRS)

Macneice, Peter

1995-01-01

This is an introduction to numerical Particle-Mesh techniques, which are commonly used to model plasmas, gravitational N-body systems, and both compressible and incompressible fluids. The theory behind this approach is presented, and its practical implementation, both for serial and parallel machines, is discussed. This document is based on a four-hour lecture course presented by the author at the NASA Summer School for High Performance Computational Physics, held at Goddard Space Flight Center.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

NASA Astrophysics Data System (ADS)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
A survey of GPU-based acceleration techniques in MRI reconstructions

PubMed Central

Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou

2018-01-01

Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.

PubMed

Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong

2018-03-01

Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

NASA Astrophysics Data System (ADS)

Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

2015-04-01

The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.
Colovesical fistula causing an uncommon reason for failure of computed tomography colonography: a case report.

PubMed

Neroladaki, Angeliki; Breguet, Romain; Botsikas, Diomidis; Terraz, Sylvain; Becker, Christoph D; Montet, Xavier

2012-07-23

Computed tomography colonography, or virtual colonoscopy, is a good alternative to optical colonoscopy. However, suboptimal patient preparation or colon distension may reduce the diagnostic accuracy of this imaging technique. We report the case of an 83-year-old Caucasian woman who presented with a five-month history of pneumaturia and fecaluria and an acute episode of macrohematuria, leading to a high clinical suspicion of a colovesical fistula. The fistula was confirmed by standard contrast-enhanced computed tomography. Optical colonoscopy was performed to exclude the presence of an underlying colonic neoplasm. Since optical colonoscopy was incomplete, computed tomography colonography was performed, but also failed due to inadequate colon distension. The insufflated air directly accumulated within the bladder via the large fistula. Clinicians should consider colovesical fistula as a potential reason for computed tomography colonography failure.
High Performance GPU-Based Fourier Volume Rendering.

PubMed

Abdellah, Marwan; Eldeib, Ayman; Sharawi, Amr

2015-01-01

Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of its (N (2)log⁡N) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are (N (3)) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look like X-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.
Impedance computations and beam-based measurements: A problem of discrepancy

NASA Astrophysics Data System (ADS)

Smaluk, Victor

2018-04-01

High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictions based on the computed impedance budgets show a significant discrepancy. Three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.
D-region blunt probe data analysis using hybrid computer techniques

NASA Technical Reports Server (NTRS)

Burkhard, W. J.

1973-01-01

The feasibility of performing data reduction techniques with a hybrid computer was studied. The data was obtained from the flight of a parachute born probe through the D-region of the ionosphere. A presentation of the theory of blunt probe operation is included with emphasis on the equations necessary to perform the analysis. This is followed by a discussion of computer program development. Included in this discussion is a comparison of computer and hand reduction results for the blunt probe launched on 31 January 1972. The comparison showed that it was both feasible and desirable to use the computer for data reduction. The results of computer data reduction performed on flight data acquired from five blunt probes are also presented.
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

DTIC Science & Technology

2015-06-01

system accuracy. The AnRAD system was also generalized for the additional application of network intrusion detection . A self-structuring technique...to Host- based Intrusion Detection Systems using Contiguous and Discontiguous System Call Patterns,” IEEE Transactions on Computer, 63(4), pp. 807...square kilometer areas. The anomaly recognition and detection (AnRAD) system was built as a cogent confabulation network . It represented road
Active Nodal Task Seeking for High-Performance, Ultra-Dependable Computing

DTIC Science & Technology

1994-07-01

implementation. Figure 1 shows a hardware organization of ANTS: stand-alone computing nodes inter - connected by buses. 2.1 Run Time Partitioning The...nodes in 14 respond to changing loads [27] or system reconfiguration [26]. Existing techniques are all source-initiated or server-initiated [27]. 5.1...short-running task segments. The task segments must be short-running in order that processors will become avalable often enough to satisfy changing
Network survivability performance (computer diskette)

NASA Astrophysics Data System (ADS)

1993-11-01

File characteristics: Data file; 1 file. Physical description: 1 computer diskette; 3 1/2 in.; high density; 2.0MB. System requirements: Mac; Word. This technical report has been developed to address the survivability of telecommunications networks including services. It responds to the need for a common understanding of, and assessment techniques for network survivability, availability, integrity, and reliability. It provides a basis for designing and operating telecommunication networks to user expectations for network survivability.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations

NASA Astrophysics Data System (ADS)

Smith, Luke; Liang, Qiuhua

2015-04-01

Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows simulation of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.
Use of nonlinear design optimization techniques in the comparison of battery discharger topologies for the space platform

NASA Technical Reports Server (NTRS)

Sable, Dan M.; Cho, Bo H.; Lee, Fred C.

1990-01-01

A detailed comparison of a boost converter, a voltage-fed, autotransformer converter, and a multimodule boost converter, designed specifically for the space platform battery discharger, is performed. Computer-based nonlinear optimization techniques are used to facilitate an objective comparison. The multimodule boost converter is shown to be the optimum topology at all efficiencies. The margin is greatest at 97 percent efficiency. The multimodule, multiphase boost converter combines the advantages of high efficiency, light weight, and ample margin on the component stresses, thus ensuring high reliability.
Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

NASA Technical Reports Server (NTRS)

Scheper, C.; Baker, R.; Frank, G.; Yalamanchili, S.; Gray, G.

1992-01-01

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified.
AI tools in computer based problem solving

NASA Technical Reports Server (NTRS)

Beane, Arthur J.

1988-01-01

The use of computers to solve value oriented, deterministic, algorithmic problems, has evolved a structured life cycle model of the software process. The symbolic processing techniques used, primarily in research, for solving nondeterministic problems, and those for which an algorithmic solution is unknown, have evolved a different model, much less structured. Traditionally, the two approaches have been used completely independently. With the advent of low cost, high performance 32 bit workstations executing identical software with large minicomputers and mainframes, it became possible to begin to merge both models into a single extended model of computer problem solving. The implementation of such an extended model on a VAX family of micro/mini/mainframe systems is described. Examples in both development and deployment of applications involving a blending of AI and traditional techniques are given.

Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
Acceleration of atmospheric Cherenkov telescope signal processing to real-time speed with the Auto-Pipe design system

NASA Astrophysics Data System (ADS)

Tyson, Eric J.; Buckley, James; Franklin, Mark A.; Chamberlain, Roger D.

2008-10-01

The imaging atmospheric Cherenkov technique for high-energy gamma-ray astronomy is emerging as an important new technique for studying the high energy universe. Current experiments have data rates of ≈20TB/year and duty cycles of about 10%. In the future, more sensitive experiments may produce up to 1000 TB/year. The data analysis task for these experiments requires keeping up with this data rate in close to real-time. Such data analysis is a classic example of a streaming application with very high performance requirements. This class of application often benefits greatly from the use of non-traditional approaches for computation including using special purpose hardware (FPGAs and ASICs), or sophisticated parallel processing techniques. However, designing, debugging, and deploying to these architectures is difficult and thus they are not widely used by the astrophysics community. This paper presents the Auto-Pipe design toolset that has been developed to address many of the difficulties in taking advantage of complex streaming computer architectures for such applications. Auto-Pipe incorporates a high-level coordination language, functional and performance simulation tools, and the ability to deploy applications to sophisticated architectures. Using the Auto-Pipe toolset, we have implemented the front-end portion of an imaging Cherenkov data analysis application, suitable for real-time or offline analysis. The application operates on data from the VERITAS experiment, and shows how Auto-Pipe can greatly ease performance optimization and application deployment of a wide variety of platforms. We demonstrate a performance improvement over a traditional software approach of 32x using an FPGA solution and 3.6x using a multiprocessor based solution.
High-Performance Computing and Visualization | Energy Systems Integration

Science.gov Websites

Facility | NREL High-Performance Computing and Visualization High-Performance Computing and Visualization High-performance computing (HPC) and visualization at NREL propel technology innovation as a . Capabilities High-Performance Computing NREL is home to Peregrine-the largest high-performance computing system
Computer-assisted preoperative simulation for positioning and fixation of plate in 2-stage procedure combining maxillary advancement by distraction technique and mandibular setback surgery.

PubMed

Suenaga, Hideyuki; Taniguchi, Asako; Yonenaga, Kazumichi; Hoshi, Kazuto; Takato, Tsuyoshi

2016-01-01

Computer-assisted preoperative simulation surgery is employed to plan and interact with the 3D images during the orthognathic procedure. It is useful for positioning and fixation of maxilla by a plate. We report a case of maxillary retrusion by a bilateral cleft lip and palate, in which a 2-stage orthognathic procedure (maxillary advancement by distraction technique and mandibular setback surgery) was performed following a computer-assisted preoperative simulation planning to achieve the positioning and fixation of the plate. A high accuracy was achieved in the present case. A 21-year-old male patient presented to our department with a complaint of maxillary retrusion following bilateral cleft lip and palate. Computer-assisted preoperative simulation with 2-stage orthognathic procedure using distraction technique and mandibular setback surgery was planned. The preoperative planning of the procedure resulted in good aesthetic outcomes. The error of the maxillary position was less than 1mm. The implementation of the computer-assisted preoperative simulation for the positioning and fixation of plate in 2-stage orthognathic procedure using distraction technique and mandibular setback surgery yielded good results. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Statistical approach for selection of biologically informative genes.

PubMed

Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N

2018-05-20

Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.
A Survey of Computational Intelligence Techniques in Protein Function Prediction

PubMed Central

Tiwari, Arvind Kumar; Srivastava, Rajeev

2014-01-01

During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395
Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

2014-11-11

has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
FPGA-based architecture for motion recovering in real-time

NASA Astrophysics Data System (ADS)

Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar

2002-03-01

A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
Implementation of ADI: Schemes on MIMD parallel computers

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1993-01-01

In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.
Kruskal-Wallis-based computationally efficient feature selection for face recognition.

PubMed

Ali Khan, Sajid; Hussain, Ayyaz; Basit, Abdul; Akram, Sheeraz

2014-01-01

Face recognition in today's technological world, and face recognition applications attain much more importance. Most of the existing work used frontal face images to classify face image. However these techniques fail when applied on real world face images. The proposed technique effectively extracts the prominent facial features. Most of the features are redundant and do not contribute to representing face. In order to eliminate those redundant features, computationally efficient algorithm is used to select the more discriminative face features. Extracted features are then passed to classification step. In the classification step, different classifiers are ensemble to enhance the recognition accuracy rate as single classifier is unable to achieve the high accuracy. Experiments are performed on standard face database images and results are compared with existing techniques.
Load Balancing Strategies for Multi-Block Overset Grid Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)

2002-01-01

The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal

Open Computing Language (OpenCL) is a high-level language that enables software programmers to explore Field Programmable Gate Arrays (FPGAs) for application acceleration. The Intel FPGA software development kit (SDK) for OpenCL allows a user to specify applications at a high level and explore the performance of low-level hardware acceleration. In this report, we present the FPGA performance and power consumption results of the single-precision floating-point vector add OpenCL kernel using the Intel FPGA SDK for OpenCL on the Nallatech 385A FPGA board. The board features an Arria 10 FPGA. We evaluate the FPGA implementations using the compute unit duplication andmore » kernel vectorization optimization techniques. On the Nallatech 385A FPGA board, the maximum compute kernel bandwidth we achieve is 25.8 GB/s, approximately 76% of the peak memory bandwidth. The power consumption of the FPGA device when running the kernels ranges from 29W to 42W.« less
Models and techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1977-01-01

Models, measures and techniques were developed for evaluating the effectiveness of aircraft computing systems. The concept of effectiveness involves aspects of system performance, reliability and worth. Specifically done was a detailed development of model hierarchy at mission, functional task, and computational task levels. An appropriate class of stochastic models was investigated which served as bottom level models in the hierarchial scheme. A unified measure of effectiveness called 'performability' was defined and formulated.
A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

NASA Astrophysics Data System (ADS)

Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

2016-05-01

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Schaeffel, J.A.; Mullinix, B.R.; Ranson, W.F.

An experimental technique to simulate and evaluate the effects of high concentrations of x-rays resulting from a nuclear detonation on missile structures is presented. Data from 34 tests are included to demonstrate the technique. The effects of variations in the foil thickness, capacitor voltage, and plate thickness on the total impulse and maximum strain in the structure were determined. The experimental technique utilizes a high energy capacitor discharge unit to explode an aluminum foil on the surface of the structure. The structural response is evaluated by optical methods using the grid slope deflection method. The fringe patterns were recorded usingmore » a high-speed framing camera. The data were digitized using an optical comparator with an x-y table. The analysis was performed on a CDC 6600 computer.« less
Digital receiver study and implementation

NASA Technical Reports Server (NTRS)

Fogle, D. A.; Lee, G. M.; Massey, J. C.

1972-01-01

Computer software was developed which makes it possible to use any general purpose computer with A/D conversion capability as a PSK receiver for low data rate telemetry processing. Carrier tracking, bit synchronization, and matched filter detection are all performed digitally. To aid in the implementation of optimum computer processors, a study of general digital processing techniques was performed which emphasized various techniques for digitizing general analog systems. In particular, the phase-locked loop was extensively analyzed as a typical non-linear communication element. Bayesian estimation techniques for PSK demodulation were studied. A hardware implementation of the digital Costas loop was developed.
Rotordynamic Instability Problems in High-Performance Turbomachinery, 1990

NASA Technical Reports Server (NTRS)

1991-01-01

The present workshop continues to report field experience and experimental results, and it expands the use of computational and control techniques with the integration of damper, bearing, and eccentric seal operation results. The intent of the workshop was to provide a continuing impetus for an understanding and resolution of these problems.
Wind Farm Flow Modeling using an Input-Output Reduced-Order Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Annoni, Jennifer; Gebraad, Pieter; Seiler, Peter

Wind turbines in a wind farm operate individually to maximize their own power regardless of the impact of aerodynamic interactions on neighboring turbines. There is the potential to increase power and reduce overall structural loads by properly coordinating turbines. To perform control design and analysis, a model needs to be of low computational cost, but retains the necessary dynamics seen in high-fidelity models. The objective of this work is to obtain a reduced-order model that represents the full-order flow computed using a high-fidelity model. A variety of methods, including proper orthogonal decomposition and dynamic mode decomposition, can be used tomore » extract the dominant flow structures and obtain a reduced-order model. In this paper, we combine proper orthogonal decomposition with a system identification technique to produce an input-output reduced-order model. This technique is used to construct a reduced-order model of the flow within a two-turbine array computed using a large-eddy simulation.« less
Electro-optic Mach-Zehnder Interferometer based Optical Digital Magnitude Comparator and 1's Complement Calculator

NASA Astrophysics Data System (ADS)

Kumar, Ajay; Raghuwanshi, Sanjeev Kumar

2016-06-01

The optical switching activity is one of the most essential phenomena in the optical domain. The electro-optic effect-based switching phenomena are applicable to generate some effective combinational and sequential logic circuits. The processing of digital computational technique in the optical domain includes some considerable advantages of optical communication technology, e.g. immunity to electro-magnetic interferences, compact size, signal security, parallel computing and larger bandwidth. The paper describes some efficient technique to implement single bit magnitude comparator and 1's complement calculator using the concepts of electro-optic effect. The proposed techniques are simulated on the MATLAB software. However, the suitability of the techniques is verified using the highly reliable Opti-BPM software. It is interesting to analyze the circuits in order to specify some optimized device parameter in order to optimize some performance affecting parameters, e.g. crosstalk, extinction ratio, signal losses through the curved and straight waveguide sections.
Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

Modified signed-digit trinary addition using synthetic wavelet filter

NASA Astrophysics Data System (ADS)

Iftekharuddin, K. M.; Razzaque, M. A.

2000-09-01

The modified signed-digit (MSD) number system has been a topic of interest as it allows for parallel carry-free addition of two numbers for digital optical computing. In this paper, harmonic wavelet joint transform (HWJT)-based correlation technique is introduced for optical implementation of MSD trinary adder implementation. The realization of the carry-propagation-free addition of MSD trinary numerals is demonstrated using synthetic HWJT correlator model. It is also shown that the proposed synthetic wavelet filter-based correlator shows high performance in logic processing. Simulation results are presented to validate the performance of the proposed technique.
Generalized dynamic engine simulation techniques for the digital computer

NASA Technical Reports Server (NTRS)

Sellers, J.; Teren, F.

1974-01-01

Recently advanced simulation techniques have been developed for the digital computer and used as the basis for development of a generalized dynamic engine simulation computer program, called DYNGEN. This computer program can analyze the steady state and dynamic performance of many kinds of aircraft gas turbine engines. Without changes to the basic program, DYNGEN can analyze one- or two-spool turbofan engines. The user must supply appropriate component performance maps and design-point information. Examples are presented to illustrate the capabilities of DYNGEN in the steady state and dynamic modes of operation. The analytical techniques used in DYNGEN are briefly discussed, and its accuracy is compared with a comparable simulation using the hybrid computer. The impact of DYNGEN and similar all-digital programs on future engine simulation philosophy is also discussed.
Generalized dynamic engine simulation techniques for the digital computer

NASA Technical Reports Server (NTRS)

Sellers, J.; Teren, F.

1974-01-01

Recently advanced simulation techniques have been developed for the digital computer and used as the basis for development of a generalized dynamic engine simulation computer program, called DYNGEN. This computer program can analyze the steady state and dynamic performance of many kinds of aircraft gas turbine engines. Without changes to the basic program DYNGEN can analyze one- or two-spool turbofan engines. The user must supply appropriate component performance maps and design-point information. Examples are presented to illustrate the capabilities of DYNGEN in the steady state and dynamic modes of operation. The analytical techniques used in DYNGEN are briefly discussed, and its accuracy is compared with a comparable simulation using the hybrid computer. The impact of DYNGEN and similar all-digital programs on future engine simulation philosophy is also discussed.
Generalized dynamic engine simulation techniques for the digital computers

NASA Technical Reports Server (NTRS)

Sellers, J.; Teren, F.

1975-01-01

Recently advanced simulation techniques have been developed for the digital computer and used as the basis for development of a generalized dynamic engine simulation computer program, called DYNGEN. This computer program can analyze the steady state and dynamic performance of many kinds of aircraft gas turbine engines. Without changes to the basic program, DYNGEN can analyze one- or two-spool turbofan engines. The user must supply appropriate component performance maps and design point information. Examples are presented to illustrate the capabilities of DYNGEN in the steady state and dynamic modes of operation. The analytical techniques used in DYNGEN are briefly discussed, and its accuracy is compared with a comparable simulation using the hybrid computer. The impact of DYNGEN and similar digital programs on future engine simulation philosophy is also discussed.
Performance Analysis and Design Synthesis (PADS) computer program. Volume 2: Program description, part 2

NASA Technical Reports Server (NTRS)

1972-01-01

The QL module of the Performance Analysis and Design Synthesis (PADS) computer program is described. Execution of this module is initiated when and if subroutine PADSI calls subroutine GROPE. Subroutine GROPE controls the high level logical flow of the QL module. The purpose of the module is to determine a trajectory that satisfies the necessary variational conditions for optimal performance. The module achieves this by solving a nonlinear multi-point boundary value problem. The numerical method employed is described. It is an iterative technique that converges quadratically when it does converge. The three basic steps of the module are: (1) initialization, (2) iteration, and (3) culmination. For Volume 1 see N73-13199.
Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications

NASA Technical Reports Server (NTRS)

Edwards, David E.; Haimes, Robert

1999-01-01

An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.
Computational fluid dynamics analysis of cyclist aerodynamics: performance of different turbulence-modelling and boundary-layer modelling approaches.

PubMed

Defraeye, Thijs; Blocken, Bert; Koninckx, Erwin; Hespel, Peter; Carmeliet, Jan

2010-08-26

This study aims at assessing the accuracy of computational fluid dynamics (CFD) for applications in sports aerodynamics, for example for drag predictions of swimmers, cyclists or skiers, by evaluating the applied numerical modelling techniques by means of detailed validation experiments. In this study, a wind-tunnel experiment on a scale model of a cyclist (scale 1:2) is presented. Apart from three-component forces and moments, also high-resolution surface pressure measurements on the scale model's surface, i.e. at 115 locations, are performed to provide detailed information on the flow field. These data are used to compare the performance of different turbulence-modelling techniques, such as steady Reynolds-averaged Navier-Stokes (RANS), with several k-epsilon and k-omega turbulence models, and unsteady large-eddy simulation (LES), and also boundary-layer modelling techniques, namely wall functions and low-Reynolds number modelling (LRNM). The commercial CFD code Fluent 6.3 is used for the simulations. The RANS shear-stress transport (SST) k-omega model shows the best overall performance, followed by the more computationally expensive LES. Furthermore, LRNM is clearly preferred over wall functions to model the boundary layer. This study showed that there are more accurate alternatives for evaluating flow around bluff bodies with CFD than the standard k-epsilon model combined with wall functions, which is often used in CFD studies in sports. 2010 Elsevier Ltd. All rights reserved.
An application of high authority/low authority control and positivity

NASA Technical Reports Server (NTRS)

Seltzer, S. M.; Irwin, D.; Tollison, D.; Waites, H. B.

1988-01-01

Control Dynamics Company (CDy), in conjunction with NASA Marshall Space Flight Center (MSFC), has supported the U.S. Air Force Wright Aeronautical Laboratory (AFWAL) in conducting an investigation of the implementation of several DOD controls techniques. These techniques are to provide vibration suppression and precise attitude control for flexible space structures. AFWAL issued a contract to Control Dynamics to perform this work under the Active Control Technique Evaluation for Spacecraft (ACES) Program. The High Authority Control/Low Authority Control (HAC/LAC) and Positivity controls techniques, which were cultivated under the DARPA Active Control of Space Structures (ACOSS) Program, were applied to a structural model of the NASA/MSFC Ground Test Facility ACES configuration. The control systems design were accomplished and linear post-analyses of the closed-loop systems are provided. The control system designs take into account effects of sampling and delay in the control computer. Nonlinear simulation runs were used to verify the control system designs and implementations in the facility control computers. Finally, test results are given to verify operations of the control systems in the test facility.
The TeraShake Computational Platform for Large-Scale Earthquake Simulations

NASA Astrophysics Data System (ADS)

Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
High-efficiency photorealistic computer-generated holograms based on the backward ray-tracing technique

NASA Astrophysics Data System (ADS)

Wang, Yuan; Chen, Zhidong; Sang, Xinzhu; Li, Hui; Zhao, Linmin

2018-03-01

Holographic displays can provide the complete optical wave field of a three-dimensional (3D) scene, including the depth perception. However, it often takes a long computation time to produce traditional computer-generated holograms (CGHs) without more complex and photorealistic rendering. The backward ray-tracing technique is able to render photorealistic high-quality images, which noticeably reduce the computation time achieved from the high-degree parallelism. Here, a high-efficiency photorealistic computer-generated hologram method is presented based on the ray-tracing technique. Rays are parallelly launched and traced under different illuminations and circumstances. Experimental results demonstrate the effectiveness of the proposed method. Compared with the traditional point cloud CGH, the computation time is decreased to 24 s to reconstruct a 3D object of 100 ×100 rays with continuous depth change.
Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Yier

As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.

PubMed

Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin

2014-10-01

High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems

PubMed Central

Andrade, G.; Ferreira, R.; Teodoro, George; Rocha, Leonardo; Saltz, Joel H.; Kurc, Tahsin

2015-01-01

High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales. PMID:26640423
Application of computational aero-acoustics to real world problems

NASA Technical Reports Server (NTRS)

Hardin, Jay C.

1996-01-01

The application of computational aeroacoustics (CAA) to real problems is discussed in relation to the analysis performed with the aim of assessing the application of the various techniques. It is considered that the applications are limited by the inability of the computational resources to resolve the large range of scales involved in high Reynolds number flows. Possible simplifications are discussed. It is considered that problems remain to be solved in relation to the efficient use of the power of parallel computers and in the development of turbulent modeling schemes. The goal of CAA is stated as being the implementation of acoustic design studies on a computer terminal with reasonable run times.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
A Survey of Architectural Techniques for Near-Threshold Computing

DOE PAGES

Mittal, Sparsh

2015-12-28

Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. In low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate and performance loss etc. Our paper surveys several re- cent techniques which aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlightmore » their similarities and differences. Ultimately, we hope that this paper will provide insights into state-of-art NTC techniques to researchers and system-designers and inspire further research in this field.« less
A Survey of Architectural Techniques for Near-Threshold Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. In low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate and performance loss etc. Our paper surveys several re- cent techniques which aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlightmore » their similarities and differences. Ultimately, we hope that this paper will provide insights into state-of-art NTC techniques to researchers and system-designers and inspire further research in this field.« less
Determination of high temperature strains using a PC based vision system

NASA Astrophysics Data System (ADS)

McNeill, Stephen R.; Sutton, Michael A.; Russell, Samuel S.

1992-09-01

With the widespread availability of video digitizers and cheap personal computers, the use of computer vision as an experimental tool is becoming common place. These systems are being used to make a wide variety of measurements that range from simple surface characterization to velocity profiles. The Sub-Pixel Digital Image Correlation technique has been developed to measure full field displacement and gradients of the surface of an object subjected to a driving force. The technique has shown its utility by measuring the deformation and movement of objects that range from simple translation to fluid velocity profiles to crack tip deformation of solid rocket fuel. This technique has recently been improved and used to measure the surface displacement field of an object at high temperature. The development of a PC based Sub-Pixel Digital Image Correlation system has yielded an accurate and easy to use system for measuring surface displacements and gradients. Experiments have been performed to show the system is viable for measuring thermal strain.
Geopotential Error Analysis from Satellite Gradiometer and Global Positioning System Observables on Parallel Architecture

NASA Technical Reports Server (NTRS)

Schutz, Bob E.; Baker, Gregory A.

1997-01-01

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.

Geopotential error analysis from satellite gradiometer and global positioning system observables on parallel architectures

NASA Astrophysics Data System (ADS)

Baker, Gregory Allen

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
Techniques for the rapid display and manipulation of 3-D biomedical data.

PubMed

Goldwasser, S M; Reynolds, R A; Talton, D A; Walsh, E S

1988-01-01

The use of fully interactive 3-D workstations with true real-time performance will become increasingly common as technology matures and economical commercial systems become available. This paper provides a comprehensive introduction to high speed approaches to the display and manipulation of 3-D medical objects obtained from tomographic data acquisition systems such as CT, MR, and PET. A variety of techniques are outlined including the use of software on conventional minicomputers, hardware assist devices such as array processors and programmable frame buffers, and special purpose computer architecture for dedicated high performance systems. While both algorithms and architectures are addressed, the major theme centers around the utilization of hardware-based approaches including parallel processors for the implementation of true real-time systems.
Second-order shaped pulsed for solid-state quantum computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sengupta, Pinaki

2008-01-01

We present the construction and detailed analysis of highly optimized self-refocusing pulse shapes for several rotation angles. We characterize the constructed pulses by the coefficients appearing in the Magnus expansion up to second order. This allows a semianalytical analysis of the performance of the constructed shapes in sequences and composite pulses by computing the corresponding leading-order error operators. Higher orders can be analyzed with the numerical technique suggested by us previously. We illustrate the technique by analyzing several composite pulses designed to protect against pulse amplitude errors, and on decoupling sequences for potentially long chains of qubits with on-site andmore » nearest-neighbor couplings.« less
Computer system performance measurement techniques for ARTS III computer systems.

DOT National Transportation Integrated Search

1973-12-01

Direct measurement of computer systems is of vital importance in: a) developing an intelligent grasp of the variables which affect overall performance; b)tuning the system for optimum benefit; c)determining under what conditions saturation thresholds...
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Development of Multiobjective Optimization Techniques for Sonic Boom Minimization

NASA Technical Reports Server (NTRS)

Chattopadhyay, Aditi; Rajadas, John Narayan; Pagaldipti, Naryanan S.

1996-01-01

A discrete, semi-analytical sensitivity analysis procedure has been developed for calculating aerodynamic design sensitivities. The sensitivities of the flow variables and the grid coordinates are numerically calculated using direct differentiation of the respective discretized governing equations. The sensitivity analysis techniques are adapted within a parabolized Navier Stokes equations solver. Aerodynamic design sensitivities for high speed wing-body configurations are calculated using the semi-analytical sensitivity analysis procedures. Representative results obtained compare well with those obtained using the finite difference approach and establish the computational efficiency and accuracy of the semi-analytical procedures. Multidisciplinary design optimization procedures have been developed for aerospace applications namely, gas turbine blades and high speed wing-body configurations. In complex applications, the coupled optimization problems are decomposed into sublevels using multilevel decomposition techniques. In cases with multiple objective functions, formal multiobjective formulation such as the Kreisselmeier-Steinhauser function approach and the modified global criteria approach have been used. Nonlinear programming techniques for continuous design variables and a hybrid optimization technique, based on a simulated annealing algorithm, for discrete design variables have been used for solving the optimization problems. The optimization procedure for gas turbine blades improves the aerodynamic and heat transfer characteristics of the blades. The two-dimensional, blade-to-blade aerodynamic analysis is performed using a panel code. The blade heat transfer analysis is performed using an in-house developed finite element procedure. The optimization procedure yields blade shapes with significantly improved velocity and temperature distributions. The multidisciplinary design optimization procedures for high speed wing-body configurations simultaneously improve the aerodynamic, the sonic boom and the structural characteristics of the aircraft. The flow solution is obtained using a comprehensive parabolized Navier Stokes solver. Sonic boom analysis is performed using an extrapolation procedure. The aircraft wing load carrying member is modeled as either an isotropic or a composite box beam. The isotropic box beam is analyzed using thin wall theory. The composite box beam is analyzed using a finite element procedure. The developed optimization procedures yield significant improvements in all the performance criteria and provide interesting design trade-offs. The semi-analytical sensitivity analysis techniques offer significant computational savings and allow the use of comprehensive analysis procedures within design optimization studies.
Comparing the Robustness of High-Frequency Traveling-Wave Tube Slow-Wave Circuits

NASA Technical Reports Server (NTRS)

Chevalier, Christine T.; Wilson, Jeffrey D.; Kory, Carol L.

2007-01-01

A three-dimensional electromagnetic field simulation software package was used to compute the cold-test parameters, phase velocity, on-axis interaction impedance, and attenuation, for several high-frequency traveling-wave tube slow-wave circuit geometries. This research effort determined the effects of variations in circuit dimensions on cold-test performance. The parameter variations were based on the tolerances of conventional micromachining techniques.
A Survey of Techniques for Approximate Computing

DOE PAGES

Mittal, Sparsh

2016-03-18

Approximate computing trades off computation quality with the effort expended and as rising performance demands confront with plateauing resource budgets, approximate computing has become, not merely attractive, but even imperative. Here, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units (e.g., CPU, GPU and FPGA), processor components, memory technologies etc., and programming frameworks for AC. Moreover, we classify these techniques based on several key characteristics to emphasize their similarities and differences. Finally, the aim of this paper is tomore » provide insights to researchers into working of AC techniques and inspire more efforts in this area to make AC the mainstream computing approach in future systems.« less
Space Shuttle Underside Astronaut Communications Performance Evaluation

NASA Technical Reports Server (NTRS)

Hwu, Shian U.; Dobbins, Justin A.; Loh, Yin-Chung; Kroll, Quin D.; Sham, Catherine C.

2005-01-01

The Space Shuttle Ultra High Frequency (UHF) communications system is planned to provide Radio Frequency (RF) coverage for astronauts working underside of the Space Shuttle Orbiter (SSO) for thermal tile inspection and repairing. This study is to assess the Space Shuttle UHF communication performance for astronauts in the shadow region without line-of-sight (LOS) to the Space Shuttle and Space Station UHF antennas. To insure the RF coverage performance at anticipated astronaut worksites, the link margin between the UHF antennas and Extravehicular Activity (EVA) Astronauts with significant vehicle structure blockage was analyzed. A series of near-field measurements were performed using the NASA/JSC Anechoic Chamber Antenna test facilities. Computational investigations were also performed using the electromagnetic modeling techniques. The computer simulation tool based on the Geometrical Theory of Diffraction (GTD) was used to compute the signal strengths. The signal strength was obtained by computing the reflected and diffracted fields along the propagation paths between the transmitting and receiving antennas. Based on the results obtained in this study, RF coverage for UHF communication links was determined for the anticipated astronaut worksite in the shadow region underneath the Space Shuttle.
High performance transcription factor-DNA docking with GPU computing

PubMed Central

2012-01-01

Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Understanding and Improving High-Performance I/O Subsystems

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Frieder, Gideon; Clark, A. James

1996-01-01

This research program has been conducted in the framework of the NASA Earth and Space Science (ESS) evaluations led by Dr. Thomas Sterling. In addition to the many important research findings for NASA and the prestigious publications, the program has helped orienting the doctoral research program of two students towards parallel input/output in high-performance computing. Further, the experimental results in the case of the MasPar were very useful and helpful to MasPar with which the P.I. has had many interactions with the technical management. The contributions of this program are drawn from three experimental studies conducted on different high-performance computing testbeds/platforms, and therefore presented in 3 different segments as follows: 1. Evaluating the parallel input/output subsystem of a NASA high-performance computing testbeds, namely the MasPar MP- 1 and MP-2; 2. Characterizing the physical input/output request patterns for NASA ESS applications, which used the Beowulf platform; and 3. Dynamic scheduling techniques for hiding I/O latency in parallel applications such as sparse matrix computations. This study also has been conducted on the Intel Paragon and has also provided an experimental evaluation for the Parallel File System (PFS) and parallel input/output on the Paragon. This report is organized as follows. The summary of findings discusses the results of each of the aforementioned 3 studies. Three appendices, each containing a key scholarly research paper that details the work in one of the studies are included.
Approximate Computing Techniques for Iterative Graph Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh

Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
Performance evaluation of digital phase-locked loops for advanced deep space transponders

NASA Technical Reports Server (NTRS)

Nguyen, T. M.; Hinedi, S. M.; Yeh, H.-G.; Kyriacou, C.

1994-01-01

The performances of the digital phase-locked loops (DPLL's) for the advanced deep-space transponders (ADT's) are investigated. DPLL's considered in this article are derived from the analog phase-locked loop, which is currently employed by the NASA standard deep space transponder, using S-domain to Z-domain mapping techniques. Three mappings are used to develop digital approximations of the standard deep space analog phase-locked loop, namely the bilinear transformation (BT), impulse invariant transformation (IIT), and step invariant transformation (SIT) techniques. The performance in terms of the closed loop phase and magnitude responses, carrier tracking jitter, and response of the loop to the phase offset (the difference between in incoming phase and reference phase) is evaluated for each digital approximation. Theoretical results of the carrier tracking jitter for command-on and command-off cases are then validated by computer simulation. Both theoretical and computer simulation results show that at high sampling frequency, the DPLL's approximated by all three transformations have the same tracking jitter. However, at low sampling frequency, the digital approximation using BT outperforms the others. The minimum sampling frequency for adequate tracking performance is determined for each digital approximation of the analog loop. In addition, computer simulation shows that the DPLL developed by BT provides faster response to the phase offset than IIT and SIT.
GPU-accelerated regularized iterative reconstruction for few-view cone beam CT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matenine, Dmitri, E-mail: dmitri.matenine.1@ulaval.ca; Goussard, Yves, E-mail: yves.goussard@polymtl.ca; Després, Philippe, E-mail: philippe.despres@phy.ulaval.ca

2015-04-15

Purpose: The present work proposes an iterative reconstruction technique designed for x-ray transmission computed tomography (CT). The main objective is to provide a model-based solution to the cone-beam CT reconstruction problem, yielding accurate low-dose images via few-views acquisitions in clinically acceptable time frames. Methods: The proposed technique combines a modified ordered subsets convex (OSC) algorithm and the total variation minimization (TV) regularization technique and is called OSC-TV. The number of subsets of each OSC iteration follows a reduction pattern in order to ensure the best performance of the regularization method. Considering the high computational cost of the algorithm, it ismore » implemented on a graphics processing unit, using parallelization to accelerate computations. Results: The reconstructions were performed on computer-simulated as well as human pelvic cone-beam CT projection data and image quality was assessed. In terms of convergence and image quality, OSC-TV performs well in reconstruction of low-dose cone-beam CT data obtained via a few-view acquisition protocol. It compares favorably to the few-view TV-regularized projections onto convex sets (POCS-TV) algorithm. It also appears to be a viable alternative to full-dataset filtered backprojection. Execution times are of 1–2 min and are compatible with the typical clinical workflow for nonreal-time applications. Conclusions: Considering the image quality and execution times, this method may be useful for reconstruction of low-dose clinical acquisitions. It may be of particular benefit to patients who undergo multiple acquisitions by reducing the overall imaging radiation dose and associated risks.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

2017-01-28

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

NASA Astrophysics Data System (ADS)

Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

2012-09-01

By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Peak-to-average power ratio reduction in orthogonal frequency division multiplexing-based visible light communication systems using a modified partial transmit sequence technique

NASA Astrophysics Data System (ADS)

Liu, Yan; Deng, Honggui; Ren, Shuang; Tang, Chengying; Qian, Xuewen

2018-01-01

We propose an efficient partial transmit sequence technique based on genetic algorithm and peak-value optimization algorithm (GAPOA) to reduce high peak-to-average power ratio (PAPR) in visible light communication systems based on orthogonal frequency division multiplexing (VLC-OFDM). By analysis of hill-climbing algorithm's pros and cons, we propose the POA with excellent local search ability to further process the signals whose PAPR is still over the threshold after processed by genetic algorithm (GA). To verify the effectiveness of the proposed technique and algorithm, we evaluate the PAPR performance and the bit error rate (BER) performance and compare them with partial transmit sequence (PTS) technique based on GA (GA-PTS), PTS technique based on genetic and hill-climbing algorithm (GH-PTS), and PTS based on shuffled frog leaping algorithm and hill-climbing algorithm (SFLAHC-PTS). The results show that our technique and algorithm have not only better PAPR performance but also lower computational complexity and BER than GA-PTS, GH-PTS, and SFLAHC-PTS technique.
Comparison of Implicit Collocation Methods for the Heat Equation

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)

2001-01-01

We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.

Speckle imaging techniques of the turbulence degraded images

NASA Astrophysics Data System (ADS)

Liu, Jin; Huang, Zongfu; Mao, Hongjun; Liang, Yonghui

2018-03-01

We propose a speckle imaging algorithm in which we use the improved form of spectral ratio to obtain the Fried parameter, we also use a filter to reduce the high frequency noise effects. Our algorithm makes an improvement in the quality of the reconstructed images. The performance is illustrated by computer simulations.
Developments in Cylindrical Shell Stability Analysis

NASA Technical Reports Server (NTRS)

Knight, Norman F., Jr.; Starnes, James H., Jr.

1998-01-01

Today high-performance computing systems and new analytical and numerical techniques enable engineers to explore the use of advanced materials for shell design. This paper reviews some of the historical developments of shell buckling analysis and design. The paper concludes by identifying key research directions for reliable and robust methods development in shell stability analysis and design.
Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Computational efficiency for the surface renewal method

NASA Astrophysics Data System (ADS)

Kelley, Jason; Higgins, Chad

2018-04-01

Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.
Unlocking the spatial inversion of large scanning magnetic microscopy datasets

NASA Astrophysics Data System (ADS)

Myre, J. M.; Lascu, I.; Andrade Lima, E.; Feinberg, J. M.; Saar, M. O.; Weiss, B. P.

2013-12-01

Modern scanning magnetic microscopy provides the ability to perform high-resolution, ultra-high sensitivity moment magnetometry, with spatial resolutions better than 10^-4 m and magnetic moments as weak as 10^-16 Am^2. These microscopy capabilities have enhanced numerous magnetic studies, including investigations of the paleointensity of the Earth's magnetic field, shock magnetization and demagnetization of impacts, magnetostratigraphy, the magnetic record in speleothems, and the records of ancient core dynamos of planetary bodies. A common component among many studies utilizing scanning magnetic microscopy is solving an inverse problem to determine the non-negative magnitude of the magnetic moments that produce the measured component of the magnetic field. The two most frequently used methods to solve this inverse problem are classic fast Fourier techniques in the frequency domain and non-negative least squares (NNLS) methods in the spatial domain. Although Fourier techniques are extremely fast, they typically violate non-negativity and it is difficult to implement constraints associated with the space domain. NNLS methods do not violate non-negativity, but have typically been computation time prohibitive for samples of practical size or resolution. Existing NNLS methods use multiple techniques to attain tractable computation. To reduce computation time in the past, typically sample size or scan resolution would have to be reduced. Similarly, multiple inversions of smaller sample subdivisions can be performed, although this frequently results in undesirable artifacts at subdivision boundaries. Dipole interactions can also be filtered to only compute interactions above a threshold which enables the use of sparse methods through artificial sparsity. To improve upon existing spatial domain techniques, we present the application of the TNT algorithm, named TNT as it is a "dynamite" non-negative least squares algorithm which enhances the performance and accuracy of spatial domain inversions. We show that the TNT algorithm reduces the execution time of spatial domain inversions from months to hours and that inverse solution accuracy is improved as the TNT algorithm naturally produces solutions with small norms. Using sIRM and NRM measures of multiple synthetic and natural samples we show that the capabilities of the TNT algorithm allow very large samples to be inverted without the need for alternative techniques to make the problems tractable. Ultimately, the TNT algorithm enables accurate spatial domain analysis of scanning magnetic microscopy data on an accelerated time scale that renders spatial domain analyses tractable for numerous studies, including searches for the best fit of unidirectional magnetization direction and high-resolution step-wise magnetization and demagnetization.
High-Performance Computing Data Center | Energy Systems Integration

Science.gov Websites

Facility | NREL High-Performance Computing Data Center High-Performance Computing Data Center The Energy Systems Integration Facility's High-Performance Computing Data Center is home to Peregrine -the largest high-performance computing system in the world exclusively dedicated to advancing
Impedance computations and beam-based measurements: A problem of discrepancy

DOE PAGES

Smaluk, Victor

2018-04-21

High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Impedance computations and beam-based measurements: A problem of discrepancy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smaluk, Victor

High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Final report for “Extreme-scale Algorithms and Solver Resilience”

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, William Douglas

2017-06-30

This is a joint project with principal investigators at Oak Ridge National Laboratory, Sandia National Laboratories, the University of California at Berkeley, and the University of Tennessee. Our part of the project involves developing performance models for highly scalable algorithms and the development of latency tolerant iterative methods. During this project, we extended our performance models for the Multigrid method for solving large systems of linear equations and conducted experiments with highly scalable variants of conjugate gradient methods that avoid blocking synchronization. In addition, we worked with the other members of the project on alternative techniques for resilience and reproducibility.more » We also presented an alternative approach for reproducible dot-products in parallel computations that performs almost as well as the conventional approach by separating the order of computation from the details of the decomposition of vectors across the processes.« less
High order discretization techniques for real-space ab initio simulations

NASA Astrophysics Data System (ADS)

Anderson, Christopher R.

2018-03-01

In this paper, we present discretization techniques to address numerical problems that arise when constructing ab initio approximations that use real-space computational grids. We present techniques to accommodate the singular nature of idealized nuclear and idealized electronic potentials, and we demonstrate the utility of using high order accurate grid based approximations to Poisson's equation in unbounded domains. To demonstrate the accuracy of these techniques, we present results for a Full Configuration Interaction computation of the dissociation of H2 using a computed, configuration dependent, orbital basis set.
Versatile analog pulse height computer performs real-time arithmetic operations

NASA Technical Reports Server (NTRS)

Brenner, R.; Strauss, M. G.

1967-01-01

Multipurpose analog pulse height computer performs real-time arithmetic operations on relatively fast pulses. This computer can be used for identification of charged particles, pulse shape discrimination, division of signals from position sensitive detectors, and other on-line data reduction techniques.
High-energy physics software parallelization using database techniques

NASA Astrophysics Data System (ADS)

Argante, E.; van der Stok, P. D. V.; Willers, I.

1997-02-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.
An extended Kalman filter approach to non-stationary Bayesian estimation of reduced-order vocal fold model parameters.

PubMed

Hadwin, Paul J; Peterson, Sean D

2017-04-01

The Bayesian framework for parameter inference provides a basis from which subject-specific reduced-order vocal fold models can be generated. Previously, it has been shown that a particle filter technique is capable of producing estimates and associated credibility intervals of time-varying reduced-order vocal fold model parameters. However, the particle filter approach is difficult to implement and has a high computational cost, which can be barriers to clinical adoption. This work presents an alternative estimation strategy based upon Kalman filtering aimed at reducing the computational cost of subject-specific model development. The robustness of this approach to Gaussian and non-Gaussian noise is discussed. The extended Kalman filter (EKF) approach is found to perform very well in comparison with the particle filter technique at dramatically lower computational cost. Based upon the test cases explored, the EKF is comparable in terms of accuracy to the particle filter technique when greater than 6000 particles are employed; if less particles are employed, the EKF actually performs better. For comparable levels of accuracy, the solution time is reduced by 2 orders of magnitude when employing the EKF. By virtue of the approximations used in the EKF, however, the credibility intervals tend to be slightly underpredicted.
A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
High-contrast differentiation resolution 3D imaging of rodent brain by X-ray computed microtomography

NASA Astrophysics Data System (ADS)

Zikmund, T.; Novotná, M.; Kavková, M.; Tesařová, M.; Kaucká, M.; Szarowská, B.; Adameyko, I.; Hrubá, E.; Buchtová, M.; Dražanová, E.; Starčuk, Z.; Kaiser, J.

2018-02-01

The biomedically focused brain research is largely performed on laboratory mice considering a high homology between the human and mouse genomes. A brain has an intricate and highly complex geometrical structure that is hard to display and analyse using only 2D methods. Applying some fast and efficient methods of brain visualization in 3D will be crucial for the neurobiology in the future. A post-mortem analysis of experimental animals' brains usually involves techniques such as magnetic resonance and computed tomography. These techniques are employed to visualize abnormalities in the brains' morphology or reparation processes. The X-ray computed microtomography (micro CT) plays an important role in the 3D imaging of internal structures of a large variety of soft and hard tissues. This non-destructive technique is applied in biological studies because the lab-based CT devices enable to obtain a several-micrometer resolution. However, this technique is always used along with some visualization methods, which are based on the tissue staining and thus differentiate soft tissues in biological samples. Here, a modified chemical contrasting protocol of tissues for a micro CT usage is introduced as the best tool for ex vivo 3D imaging of a post-mortem mouse brain. This way, the micro CT provides a high spatial resolution of the brain microscopic anatomy together with a high tissue differentiation contrast enabling to identify more anatomical details in the brain. As the micro CT allows a consequent reconstruction of the brain structures into a coherent 3D model, some small morphological changes can be given into context of their mutual spatial relationships.
Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance.

PubMed

Muratov, Eugene; Lewis, Margaret; Fourches, Denis; Tropsha, Alexander; Cox, Wendy C

2017-04-01

Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools. Methods. All PharmD candidates over three admission cycles were divided into two groups: those who completed the PharmD program with a GPA ≥ 3; and the remaining candidates. Random Forest machine learning technique was used to develop a binary classification model based on 11 pre-admission parameters. Results. Robust and externally predictive models were developed that had particularly high overall accuracy of 77% for candidates with high or low academic performance. These multivariate models were highly accurate in predicting these groups to those obtained using undergraduate GPA and composite PCAT scores only. Conclusion. The models developed in this study can be used to improve the admission process as preliminary filters and thus quickly identify candidates who are likely to be successful in the PharmD curriculum.
IMPROVED BIOMASS UTILIZATION THROUGH REMOTE FLOW SENSING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Washington University- St. Louis: Muthanna Al-Dahhan

The growth of the livestock industry provides a valuable source of affordable, sustainable, and renewable bioenergy, while also requiring the safe disposal of the large quantities of animal wastes (manure) generated at dairy, swine, and poultry farms. If these biomass resources are mishandled and underutilized, major environmental problems will be created, such as surface and ground water contamination, odors, dust, ammonia leaching, and methane emission. Anaerobic digestion of animal wastes, in which microorganisms break down organic materials in the absence of oxygen, is one of the most promising waste treatment technologies. This process produces biogas typically containing {approx}65% methane andmore » {approx}35% carbon dioxide. The production of biogas through anaerobic digestion from animal wastes, landfills, and municipal waste water treatment plants represents a large source of renewable and sustainable bio-fuel. Such bio-fuel can be combusted directly, used in internal combustion engines, converted into methanol, or partially oxidized to produce synthesis gas (a mixture of hydrogen and carbon monoxide) that can be converted to clean liquid fuels and chemicals via Fischer-Tropsch synthesis. Different design and mixing configurations of anaerobic digesters for treating cow manure have been utilized commercially and/or tested on a laboratory scale. These digesters include mechanically mixed, gas recirculation mixed, and slurry recirculation mixed designs, as well as covered lagoon digesters. Mixing is an important parameter for successful performance of anaerobic digesters. It enhances substrate contact with the microbial community; improves pH, temperature and substrate/microorganism uniformity; prevents stratification and scum accumulation; facilitates the removal of biogas from the digester; reduces or eliminates the formation of inactive zones (dead zones); prevents settling of biomass and inert solids; and aids in particle size reduction. Unfortunately, information and findings in the literature on the effect of mixing on anaerobic digestion are contradictory. One reason is the lack of measurement techniques for opaque systems such as digesters. Better understanding of the mixing and hydrodynamics of digesters will result in appropriate design, configuration selection, scale-up, and performance, which will ultimately enable avoiding digester failures. Accordingly, this project sought to advance the fundamental knowledge and understanding of the design, scale up, operation, and performance of cow manure anaerobic digesters with high solids loading. The project systematically studied parameters affecting cow manure anaerobic digestion performance, in different configurations and sizes by implementing computer automated radioactive particle tracking (CARPT), computed tomography (CT), and computational fluid dynamics (CFD), and by developing novel multiple-particle CARPT (MP-CARPT) and dual source CT (DSCT) techniques. The accomplishments of the project were achieved in a collaborative effort among Washington University, the Oak Ridge National Laboratory, and the Iowa Energy Center teams. The following investigations and achievements were accomplished: Systematic studies of anaerobic digesters performance and kinetics using various configurations, modes of mixing, and scales (laboratory, pilot plant, and commercial sizes) were conducted and are discussed in Chapter 2. It was found that mixing significantly affected the performance of the pilot plant scale digester ({approx}97 liter). The detailed mixing and hydrodynamics were investigated using computer automated radioactive particle tracking (CARPT) techniques, and are discussed in Chapter 3. A novel multiple particle tracking technique (MP-CARPT) technique that can track simultaneously up to 8 particles was developed, tested, validated, and implemented. Phase distribution was investigated using gamma ray computer tomography (CT) techniques, which are discussed in Chapter 4. A novel dual source CT (DSCT) technique was developed to measure the phase distribution of dynamic three phase system such as digesters with high solids loading and other types of gas-liquid-solid fluidization systems. Evaluation and validation of the computational fluid dynamics (CFD) models and closures were conducted to model and simulate the hydrodynamics and mixing intensity of the anaerobic digesters (Chapter 5). It is strongly recommended that additional studies be conducted, both on hydrodynamics and performance, in large scale digesters. The studies should use advanced non-invasive measurement techniques, including the developed novel measurement techniques, to further understand their design, scale-up, performance, and operation to avoid any digester failure. The final goal is a system ready to be used by farmers on site for bioenergy production and for animal/farm waste treatment.« less
High performance cellular level agent-based simulation with FLAME for the GPU.

PubMed

Richmond, Paul; Walker, Dawn; Coakley, Simon; Romano, Daniela

2010-05-01

Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with 'bottom-up' simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.
3D Reconstruction of Chick Embryo Vascular Geometries Using Non-invasive High-Frequency Ultrasound for Computational Fluid Dynamics Studies.

PubMed

Tan, Germaine Xin Yi; Jamil, Muhammad; Tee, Nicole Gui Zhen; Zhong, Liang; Yap, Choon Hwai

2015-11-01

Recent animal studies have provided evidence that prenatal blood flow fluid mechanics may play a role in the pathogenesis of congenital cardiovascular malformations. To further these researches, it is important to have an imaging technique for small animal embryos with sufficient resolution to support computational fluid dynamics studies, and that is also non-invasive and non-destructive to allow for subject-specific, longitudinal studies. In the current study, we developed such a technique, based on ultrasound biomicroscopy scans on chick embryos. Our technique included a motion cancelation algorithm to negate embryonic body motion, a temporal averaging algorithm to differentiate blood spaces from tissue spaces, and 3D reconstruction of blood volumes in the embryo. The accuracy of the reconstructed models was validated with direct stereoscopic measurements. A computational fluid dynamics simulation was performed to model fluid flow in the generated construct of a Hamburger-Hamilton (HH) stage 27 embryo. Simulation results showed that there were divergent streamlines and a low shear region at the carotid duct, which may be linked to the carotid duct's eventual regression and disappearance by HH stage 34. We show that our technique has sufficient resolution to produce accurate geometries for computational fluid dynamics simulations to quantify embryonic cardiovascular fluid mechanics.
Comparison of computation time and image quality between full-parallax 4G-pixels CGHs calculated by the point cloud and polygon-based method

NASA Astrophysics Data System (ADS)

Nakatsuji, Noriaki; Matsushima, Kyoji

2017-03-01

Full-parallax high-definition CGHs composed of more than billion pixels were so far created only by the polygon-based method because of its high performance. However, GPUs recently allow us to generate CGHs much faster by the point cloud. In this paper, we measure computation time of object fields for full-parallax high-definition CGHs, which are composed of 4 billion pixels and reconstruct the same scene, by using the point cloud with GPU and the polygon-based method with CPU. In addition, we compare the optical and simulated reconstructions between CGHs created by these techniques to verify the image quality.

Experimental and Computational Investigation of a Translating-Throat Single-Expansion-Ramp Nozzle

NASA Technical Reports Server (NTRS)

Deere, Karen A.; Asbury, Scott C.

1999-01-01

An experimental and computational study was conducted on a high-speed, single-expansion-ramp nozzle (SERN) concept designed for efficient off-design performance. The translating-throat SERN concept adjusts the axial location of the throat to provide a variable expansion ratio and allow a more optimum jet exhaust expansion at various flight conditions in an effort to maximize nozzle performance. Three design points (throat locations) were investigated to simulate the operation of this concept at subsonic-transonic, low supersonic, and high supersonic flight conditions. The experimental study was conducted in the jet exit test facility at the Langley Research Center. Internal nozzle performance was obtained at nozzle pressure ratios (NPR's) up to 13 for six nozzles with design nozzle pressure ratios near 9, 42, and 102. Two expansion-ramp surfaces, one concave and one convex, were tested for each design point. Paint-oil flow and focusing schlieren flow visualization techniques were utilized to acquire additional flow data at selected NPR'S. The Navier-Stokes code, PAB3D, was used with a two-equation k-e turbulence model for the computational study. Nozzle performance characteristics were predicted at nozzle pressure ratios of 5, 9, and 13 for the concave ramp, low Mach number nozzle and at 10, 13, and 102 for the concave ramp, high Mach number nozzle.
Static Memory Deduplication for Performance Optimization in Cloud Computing.

PubMed

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-04-27

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing

PubMed Central

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-01-01

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Scalable nuclear density functional theory with Sky3D

NASA Astrophysics Data System (ADS)

Afibuzzaman, Md; Schuetrumpf, Bastian; Aktulga, Hasan Metin

2018-02-01

In nuclear astrophysics, quantum simulations of large inhomogeneous dense systems as they appear in the crusts of neutron stars present big challenges. The number of particles in a simulation with periodic boundary conditions is strongly limited due to the immense computational cost of the quantum methods. In this paper, we describe techniques for an efficient and scalable parallel implementation of Sky3D, a nuclear density functional theory solver that operates on an equidistant grid. Presented techniques allow Sky3D to achieve good scaling and high performance on a large number of cores, as demonstrated through detailed performance analysis on a Cray XC40 supercomputer.
A computer technique for detailed analysis of mission radius and maneuverability characteristics of fighter aircraft

NASA Technical Reports Server (NTRS)

Foss, W. E., Jr.

1981-01-01

A computer technique to determine the mission radius and maneuverability characteristics of combat aircraft was developed. The technique was used to determine critical operational requirements and the areas in which research programs would be expected to yield the most beneficial results. In turn, the results of research efforts were evaluated in terms of aircraft performance on selected mission segments and for complete mission profiles. Extensive use of the technique in evaluation studies indicates that the calculated performance is essentially the same as that obtained by the proprietary programs in use throughout the aircraft industry.
Near-Field Source Localization by Using Focusing Technique

NASA Astrophysics Data System (ADS)

He, Hongyang; Wang, Yide; Saillard, Joseph

2008-12-01

We discuss two fast algorithms to localize multiple sources in near field. The symmetry-based method proposed by Zhi and Chia (2007) is first improved by implementing a search-free procedure for the reduction of computation cost. We present then a focusing-based method which does not require symmetric array configuration. By using focusing technique, the near-field signal model is transformed into a model possessing the same structure as in the far-field situation, which allows the bearing estimation with the well-studied far-field methods. With the estimated bearing, the range estimation of each source is consequently obtained by using 1D MUSIC method without parameter pairing. The performance of the improved symmetry-based method and the proposed focusing-based method is compared by Monte Carlo simulations and with Crammer-Rao bound as well. Unlike other near-field algorithms, these two approaches require neither high-computation cost nor high-order statistics.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.

PubMed

Catarinucci, Luca; Tarricone, Luciano

2009-01-01

The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
An Integrated Development Environment for Adiabatic Quantum Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S; McCaskey, Alex; Bennink, Ryan S

2014-01-01

Adiabatic quantum computing is a promising route to the computational power afforded by quantum information processing. The recent availability of adiabatic hardware raises the question of how well quantum programs perform. Benchmarking behavior is challenging since the multiple steps to synthesize an adiabatic quantum program are highly tunable. We present an adiabatic quantum programming environment called JADE that provides control over all the steps taken during program development. JADE captures the workflow needed to rigorously benchmark performance while also allowing a variety of problem types, programming techniques, and processor configurations. We have also integrated JADE with a quantum simulation enginemore » that enables program profiling using numerical calculation. The computational engine supports plug-ins for simulation methodologies tailored to various metrics and computing resources. We present the design, integration, and deployment of JADE and discuss its use for benchmarking adiabatic quantum programs.« less
Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

2005-01-01

The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
Hybrid, experimental and computational, investigation of mechanical components

NASA Astrophysics Data System (ADS)

Furlong, Cosme; Pryputniewicz, Ryszard J.

1996-07-01

Computational and experimental methodologies have unique features for the analysis and solution of a wide variety of engineering problems. Computations provide results that depend on selection of input parameters such as geometry, material constants, and boundary conditions which, for correct modeling purposes, have to be appropriately chosen. In addition, it is relatively easy to modify the input parameters in order to computationally investigate different conditions. Experiments provide solutions which characterize the actual behavior of the object of interest subjected to specific operating conditions. However, it is impractical to experimentally perform parametric investigations. This paper discusses the use of a hybrid, computational and experimental, approach for study and optimization of mechanical components. Computational techniques are used for modeling the behavior of the object of interest while it is experimentally tested using noninvasive optical techniques. Comparisons are performed through a fringe predictor program used to facilitate the correlation between both techniques. In addition, experimentally obtained quantitative information, such as displacements and shape, can be applied in the computational model in order to improve this correlation. The result is a validated computational model that can be used for performing quantitative analyses and structural optimization. Practical application of the hybrid approach is illustrated with a representative example which demonstrates the viability of the approach as an engineering tool for structural analysis and optimization.
Development of an Active Flow Control Technique for an Airplane High-Lift Configuration

NASA Technical Reports Server (NTRS)

Shmilovich, Arvin; Yadlin, Yoram; Dickey, Eric D.; Hartwich, Peter M.; Khodadoust, Abdi

2017-01-01

This study focuses on Active Flow Control methods used in conjunction with airplane high-lift systems. The project is motivated by the simplified high-lift system, which offers enhanced airplane performance compared to conventional high-lift systems. Computational simulations are used to guide the implementation of preferred flow control methods, which require a fluidic supply. It is first demonstrated that flow control applied to a high-lift configuration that consists of simple hinge flaps is capable of attaining the performance of the conventional high-lift counterpart. A set of flow control techniques has been subsequently considered to identify promising candidates, where the central requirement is that the mass flow for actuation has to be within available resources onboard. The flow control methods are based on constant blowing, fluidic oscillators, and traverse actuation. The simulations indicate that the traverse actuation offers a substantial reduction in required mass flow, and it is especially effective when the frequency of actuation is consistent with the characteristic time scale of the flow.
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de

2016-11-15

We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Advanced large scale GaAs monolithic IF switch matrix subsystem

NASA Technical Reports Server (NTRS)

Ch'en, D. R.; Petersen, W. C.; Kiba, W. M.

1992-01-01

Attention is given to a novel chip design and packaging technique to overcome the limitations due to the high signal isolation requirements of advanced communications systems. A hermetically sealed 6 x 6 monolithic GaAs switch matrix subsystem with integral control electronics based on this technique is presented. An 0-dB insertion loss and 60-dB crosspoint isolation over a 3.5-to-6-GHz band were achieved. The internal controller portion of the switching subsystem provides crosspoint control via a standard RS-232 computer interface and can be synchronized with an external systems control computer. The measured performance of this advanced switching subsystem is fully compatible with relatively static 'switchboard' as well as dynamic TDMA modes of operation.
High-pitch dual-source CT angiography without ECG-gating for imaging the whole aorta: intraindividual comparison with standard pitch single-source technique without ECG-gating

PubMed Central

Manna, Carmelinda; Silva, Mario; Cobelli, Rocco; Poggesi, Sara; Rossi, Cristina; Sverzellati, Nicola

2017-01-01

PURPOSE We aimed to perform intraindividual comparison of computed tomography (CT) parameters, image quality, and radiation exposure between standard CT angiography (CTA) and high-pitch dual source (DS)-CTA, in subjects undergoing serial CTA of thoracoabdominal aorta. METHODS Eighteen subjects with thoracoabdominal CTA by standard technique and high-pitch DS-CTA technique within 6 months of each other were retrieved for intraindividual comparison of image quality in thoracic and abdominal aorta. Quantitative analysis was performed by comparison of mean aortic attenuation, noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR). Qualitative analysis was performed by visual assessment of motion artifacts and diagnostic confidence. Radiation exposure was quantified by effective dose. Image quality was apportioned to radiation exposure by means of figure of merit. RESULTS Mean aortic attenuation and noise were higher in high-pitch DS-CTA of thoracoabdominal aorta, whereas SNR and CNR were similar in thoracic aorta and significantly lower in high-pitch DS-CTA of abdominal aorta (P = 0.024 and P = 0.016). High-pitch DS-CTA was significantly better in the first segment of thoracic aorta. Effective dose was reduced by 72% in high-pitch DS-CTA. CONCLUSION High-pitch DS-CTA without electrocardiography-gating is an effective technique for imaging aorta with very low radiation exposure and with significant reduction of motion artifacts in ascending aorta; however, the overall quality of high-pitch DS-CTA in abdominal aorta is lower than standard CTA. PMID:28703104
Vectorization with SIMD extensions speeds up reconstruction in electron tomography.

PubMed

Agulleiro, J I; Garzón, E M; García, I; Fernández, J J

2010-06-01

Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processor's SIMD extensions in the field of 3D electron microscopy.
Advanced Techniques for Simulating the Behavior of Sand

NASA Astrophysics Data System (ADS)

Clothier, M.; Bailey, M.

2009-12-01

Computer graphics and visualization techniques continue to provide untapped research opportunities, particularly when working with earth science disciplines. Through collaboration with the Oregon Space Grant and IGERT Ecosystem Informatics programs we are developing new techniques for simulating sand. In addition, through collaboration with the Oregon Space Grant, we’ve been communicating with the Jet Propulsion Laboratory (JPL) to exchange ideas and gain feedback on our work. More specifically, JPL’s DARTS Laboratory specializes in planetary vehicle simulation, such as the Mars rovers. This simulation utilizes a virtual "sand box" to test how planetary rovers respond to different terrains while traversing them. Unfortunately, this simulation is unable to fully mimic the harsh, sandy environments of those found on Mars. Ideally, these simulations should allow a rover to interact with the sand beneath it, particularly for different sand granularities and densities. In particular, there may be situations where a rover may become stuck in sand due to lack of friction between the sand and wheels. In fact, in May 2009, the Spirit rover became stuck in the Martian sand and has provided additional motivation for this research. In order to develop a new sand simulation model, high performance computing will play a very important role in this work. More specifically, graphics processing units (GPUs) are useful due to their ability to run general purpose algorithms and ability to perform massively parallel computations. In prior research, simulating vast quantities of sand has been difficult to compute in real-time due to the computational complexity of many colliding particles. With the use of GPUs however, each particle collision will be parallelized, allowing for a dramatic performance increase. In addition, spatial partitioning will also provide a speed boost as this will help limit the number of particle collision calculations. However, since the goal of this research is to simulate the look and behavior of sand, this work will go beyond simple particle collision. In particular, we can continue to use our parallel algorithms not only on single particles but on particle “clumps” that consist of multiple combined particles. Since sand is typically not spherical in nature, these particle “clumps” help to simulate the coarse nature of sand. In a simulation environment, multiple combined particles could be used to simulate the polygonal and granular nature of sand grains. Thus, a diversity of sand particles can be generated. The interaction between these particles can then be parallelized using GPU hardware. As such, this research will investigate different graphics and physics techniques and determine the tradeoffs in performance and visual quality for sand simulation. An enhanced sand model through the use of high performance computing and GPUs has great potential to impact research for both earth and space scientists. Interaction with JPL has provided an opportunity for us to refine our simulation techniques that can ultimately be used for their vehicle simulator. As an added benefit of this work, advancements in simulating sand can also benefit scientists here on earth, especially in regard to understanding landslides and debris flows.
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
High-performance computing in image registration

NASA Astrophysics Data System (ADS)

Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro

2012-10-01

Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
CESDIS

NASA Technical Reports Server (NTRS)

1994-01-01

CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.

An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures

NASA Technical Reports Server (NTRS)

Cohl, H.

1994-01-01

We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.
Large Scale Laser Crystallization of Solution-based Alumina-doped Zinc Oxide (AZO) Nanoinks for Highly Transparent Conductive Electrode

PubMed Central

Nian, Qiong; Callahan, Michael; Saei, Mojib; Look, David; Efstathiadis, Harry; Bailey, John; Cheng, Gary J.

2015-01-01

A new method combining aqueous solution printing with UV Laser crystallization (UVLC) and post annealing is developed to deposit highly transparent and conductive Aluminum doped Zinc Oxide (AZO) films. This technique is able to rapidly produce large area AZO films with better structural and optoelectronic properties than most high vacuum deposition, suggesting a potential large-scale manufacturing technique. The optoelectronic performance improvement attributes to UVLC and forming gas annealing (FMG) induced grain boundary density decrease and electron traps passivation at grain boundaries. The physical model and computational simulation developed in this work could be applied to thermal treatment of many other metal oxide films. PMID:26515670
Image feature detection and extraction techniques performance evaluation for development of panorama under different light conditions

NASA Astrophysics Data System (ADS)

Patil, Venkat P.; Gohatre, Umakant B.

2018-04-01

The technique of obtaining a wider field-of-view of an image to get high resolution integrated image is normally required for development of panorama of a photographic images or scene from a sequence of part of multiple views. There are various image stitching methods developed recently. For image stitching five basic steps are adopted stitching which are Feature detection and extraction, Image registration, computing homography, image warping and Blending. This paper provides review of some of the existing available image feature detection and extraction techniques and image stitching algorithms by categorizing them into several methods. For each category, the basic concepts are first described and later on the necessary modifications made to the fundamental concepts by different researchers are elaborated. This paper also highlights about the some of the fundamental techniques for the process of photographic image feature detection and extraction methods under various illumination conditions. The Importance of Image stitching is applicable in the various fields such as medical imaging, astrophotography and computer vision. For comparing performance evaluation of the techniques used for image features detection three methods are considered i.e. ORB, SURF, HESSIAN and time required for input images feature detection is measured. Results obtained finally concludes that for daylight condition, ORB algorithm found better due to the fact that less tome is required for more features extracted where as for images under night light condition it shows that SURF detector performs better than ORB/HESSIAN detectors.
Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks

PubMed Central

Naveros, Francisco; Garrido, Jesus A.; Carrillo, Richard R.; Ros, Eduardo; Luque, Niceto R.

2017-01-01

Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under increasing levels of neural complexity. PMID:28223930
High-Performance Computing Systems and Operations | Computational Science |

Science.gov Websites

NREL Systems and Operations High-Performance Computing Systems and Operations NREL operates high-performance computing (HPC) systems dedicated to advancing energy efficiency and renewable energy technologies. Capabilities NREL's HPC capabilities include: High-Performance Computing Systems We operate
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

NASA Technical Reports Server (NTRS)

Sterling, Thomas; Bergman, Larry

2000-01-01

Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Porting Ordinary Applications to Blue Gene/Q Supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy

2015-08-31

Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt'smore » sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.« less
Energy and Quality-Aware Multimedia Signal Processing

NASA Astrophysics Data System (ADS)

Emre, Yunus

Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present novel techniques to mitigate the effects of SRAM memory failures in JPEG2000 implementations operating in scaled voltages. We investigate error control coding schemes and propose an unequal error protection scheme tailored for JPEG2000 that reduces overhead without affecting the performance. Furthermore, we propose algorithm-specific techniques for error compensation that exploit the fact that in JPEG2000 the discrete wavelet transform outputs have larger values for low frequency subband coefficients and smaller values for high frequency subband coefficients. Next, we present use of voltage overscaling to reduce the data-path power consumption of JPEG codecs. We propose an algorithm-specific technique which exploits the characteristics of the quantized coefficients after zig-zag scan to mitigate errors introduced by aggressive voltage scaling. Third, we investigate the effect of reducing dynamic range for datapath energy reduction. We analyze the effect of truncation error and propose a scheme that estimates the mean value of the truncation error during the pre-computation stage and compensates for this error. Such a scheme is very effective for reducing the noise power in applications that are dominated by additions and multiplications such as FIR filter and transform computation. We also present a novel sum of absolute difference (SAD) scheme that is based on most significant bit truncation. The proposed scheme exploits the fact that most of the absolute difference (AD) calculations result in small values, and most of the large AD values do not contribute to the SAD values of the blocks that are selected. Such a scheme is highly effective in reducing the energy consumption of motion estimation and intra-prediction kernels in video codecs. Finally, we present several hybrid energy-saving techniques based on combination of voltage scaling, computation reduction and dynamic range reduction that further reduce the energy consumption while keeping the performance degradation very low. For instance, a combination of computation reduction and dynamic range reduction for Discrete Cosine Transform shows on average, 33% to 46% reduction in energy consumption while incurring only 0.5dB to 1.5dB loss in PSNR.
Final Report. Institute for Ultralscale Visualization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Kwan-Liu; Galli, Giulia; Gygi, Francois

The SciDAC Institute for Ultrascale Visualization brought together leading experts from visualization, high-performance computing, and science application areas to make advanced visualization solutions for SciDAC scientists and the broader community. Over the five-year project, the Institute introduced many new enabling visualization techniques, which have significantly enhanced scientists’ ability to validate their simulations, interpret their data, and communicate with others about their work and findings. This Institute project involved a large number of junior and student researchers, who received the opportunities to work on some of the most challenging science applications and gain access to the most powerful high-performance computing facilitiesmore » in the world. They were readily trained and prepared for facing the greater challenges presented by extreme-scale computing. The Institute’s outreach efforts, through publications, workshops and tutorials, successfully disseminated the new knowledge and technologies to the SciDAC and the broader scientific communities. The scientific findings and experience of the Institute team helped plan the SciDAC3 program.« less
The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Pugh, T.; Wyborn, L. A.; Porter, D.; Allen, C.; Smillie, J.; Antony, J.; Trenham, C.; Evans, B. J.; Beckett, D.; Erwin, T.; King, E.; Hodge, J.; Woodcock, R.; Fraser, R.; Lescinsky, D. T.

2014-12-01

The National Computational Infrastructure (NCI) has co-located a priority set of national data assets within a HPC research platform. This powerful in-situ computational platform has been created to help serve and analyse the massive amounts of data across the spectrum of environmental collections - in particular the climate, observational data and geoscientific domains. This paper examines the infrastructure, innovation and opportunity for this significant research platform. NCI currently manages nationally significant data collections (10+ PB) categorised as 1) earth system sciences, climate and weather model data assets and products, 2) earth and marine observations and products, 3) geosciences, 4) terrestrial ecosystem, 5) water management and hydrology, and 6) astronomy, social science and biosciences. The data is largely sourced from the NCI partners (who include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. By co-locating these large valuable data assets, new opportunities have arisen by harmonising the data collections, making a powerful transdisciplinary research platformThe data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. New scientific software, cloud-scale techniques, server-side visualisation and data services have been harnessed and integrated into the platform, so that analysis is performed seamlessly across the traditional boundaries of the underlying data domains. Characterisation of the techniques along with performance profiling ensures scalability of each software component, all of which can either be enhanced or replaced through future improvements. A Development-to-Operations (DevOps) framework has also been implemented to manage the scale of the software complexity alone. This ensures that software is both upgradable and maintainable, and can be readily reused with complexly integrated systems and become part of the growing global trusted community tools for cross-disciplinary research.
IVF: exploiting intensity variation function for high-performance pedestrian tracking in forward-looking infrared imagery

NASA Astrophysics Data System (ADS)

Lamberti, Fabrizio; Sanna, Andrea; Paravati, Gianluca; Belluccini, Luca

2014-02-01

Tracking pedestrian targets in forward-looking infrared video sequences is a crucial component of a growing number of applications. At the same time, it is particularly challenging, since image resolution and signal-to-noise ratio are generally very low, while the nonrigidity of the human body produces highly variable target shapes. Moreover, motion can be quite chaotic with frequent target-to-target and target-to-scene occlusions. Hence, the trend is to design ever more sophisticated techniques, able to ensure rather accurate tracking results at the cost of a generally higher complexity. However, many of such techniques might not be suitable for real-time tracking in limited-resource environments. This work presents a technique that extends an extremely computationally efficient tracking method based on target intensity variation and template matching originally designed for targets with a marked and stable hot spot by adapting it to deal with much more complex thermal signatures and by removing the native dependency on configuration choices. Experimental tests demonstrated that, by working on multiple hot spots, the designed technique is able to achieve the robustness of other common approaches by limiting drifts and preserving the low-computational footprint of the reference method.
Automation of checkout for the shuttle operations era

NASA Technical Reports Server (NTRS)

Anderson, J. A.; Hendrickson, K. O.

1985-01-01

The Space Shuttle checkout is different from its Apollo predecessor. The complexity of the hardware, the shortened turnaround time, and the software that performs ground checkout are outlined. Generating new techniques and standards for software development and the management structure to control it are implemented. The utilization of computer systems for vehicle testing is high lighted.
Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

PubMed

Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Computer-aided classification of lung nodules on computed tomography images via deep learning technique

PubMed Central

Hua, Kai-Lung; Hsu, Che-Hao; Hidayati, Shintami Chusnul; Cheng, Wen-Huang; Chen, Yu-Jen

2015-01-01

Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain. PMID:26346558
Computer-aided classification of lung nodules on computed tomography images via deep learning technique.

PubMed

Hua, Kai-Lung; Hsu, Che-Hao; Hidayati, Shintami Chusnul; Cheng, Wen-Huang; Chen, Yu-Jen

2015-01-01

Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain.
Analytic double product integrals for all-frequency relighting.

PubMed

Wang, Rui; Pan, Minghao; Chen, Weifeng; Ren, Zhong; Zhou, Kun; Hua, Wei; Bao, Hujun

2013-07-01

This paper presents a new technique for real-time relighting of static scenes with all-frequency shadows from complex lighting and highly specular reflections from spatially varying BRDFs. The key idea is to depict the boundaries of visible regions using piecewise linear functions, and convert the shading computation into double product integrals—the integral of the product of lighting and BRDF on visible regions. By representing lighting and BRDF with spherical Gaussians and approximating their product using Legendre polynomials locally in visible regions, we show that such double product integrals can be evaluated in an analytic form. Given the precomputed visibility, our technique computes the visibility boundaries on the fly at each shading point, and performs the analytic integral to evaluate the shading color. The result is a real-time all-frequency relighting technique for static scenes with dynamic, spatially varying BRDFs, which can generate more accurate shadows than the state-of-the-art real-time PRT methods.
Computer-aided detection systems to improve lung cancer early diagnosis: state-of-the-art and challenges

NASA Astrophysics Data System (ADS)

Traverso, A.; Lopez Torres, E.; Fantacci, M. E.; Cerello, P.

2017-05-01

Lung cancer is one of the most lethal types of cancer, because its early diagnosis is not good enough. In fact, the detection of pulmonary nodule, potential lung cancers, in Computed Tomography scans is a very challenging and time-consuming task for radiologists. To support radiologists, researchers have developed Computer-Aided Diagnosis (CAD) systems for the automated detection of pulmonary nodules in chest Computed Tomography scans. Despite the high level of technological developments and the proved benefits on the overall detection performance, the usage of Computer-Aided Diagnosis in clinical practice is far from being a common procedure. In this paper we investigate the causes underlying this discrepancy and present a solution to tackle it: the M5L WEB- and Cloud-based on-demand Computer-Aided Diagnosis. In addition, we prove how the combination of traditional imaging processing techniques with state-of-art advanced classification algorithms allows to build a system whose performance could be much larger than any Computer-Aided Diagnosis developed so far. This outcome opens the possibility to use the CAD as clinical decision support for radiologists.
Advances in computational design and analysis of airbreathing propulsion systems

NASA Technical Reports Server (NTRS)

Klineberg, John M.

1989-01-01

The development of commercial and military aircraft depends, to a large extent, on engine manufacturers being able to achieve significant increases in propulsion capability through improved component aerodynamics, materials, and structures. The recent history of propulsion has been marked by efforts to develop computational techniques that can speed up the propulsion design process and produce superior designs. The availability of powerful supercomputers, such as the NASA Numerical Aerodynamic Simulator, and the potential for even higher performance offered by parallel computer architectures, have opened the door to the use of multi-dimensional simulations to study complex physical phenomena in propulsion systems that have previously defied analysis or experimental observation. An overview of several NASA Lewis research efforts is provided that are contributing toward the long-range goal of a numerical test-cell for the integrated, multidisciplinary design, analysis, and optimization of propulsion systems. Specific examples in Internal Computational Fluid Mechanics, Computational Structural Mechanics, Computational Materials Science, and High Performance Computing are cited and described in terms of current capabilities, technical challenges, and future research directions.
High-latitude filtering in a global grid-point model using model normal modes. [Fourier filters for synoptic weather forecasting

NASA Technical Reports Server (NTRS)

Takacs, L. L.; Kalnay, E.; Navon, I. M.

1985-01-01

A normal modes expansion technique is applied to perform high latitude filtering in the GLAS fourth order global shallow water model with orography. The maximum permissible time step in the solution code is controlled by the frequency of the fastest propagating mode, which can be a gravity wave. Numerical methods are defined for filtering the data to identify the number of gravity modes to be included in the computations in order to obtain the appropriate zonal wavenumbers. The performances of the model with and without the filter, and with a time tendency and a prognostic field filter are tested with simulations of the Northern Hemisphere winter. The normal modes expansion technique is shown to leave the Rossby modes intact and permit 3-5 day predictions, a range not possible with the other high-latitude filters.
A human visual based binarization technique for histological images

NASA Astrophysics Data System (ADS)

Shreyas, Kamath K. M.; Rajendran, Rahul; Panetta, Karen; Agaian, Sos

2017-05-01

In the field of vision-based systems for object detection and classification, thresholding is a key pre-processing step. Thresholding is a well-known technique for image segmentation. Segmentation of medical images, such as Computed Axial Tomography (CAT), Magnetic Resonance Imaging (MRI), X-Ray, Phase Contrast Microscopy, and Histological images, present problems like high variability in terms of the human anatomy and variation in modalities. Recent advances made in computer-aided diagnosis of histological images help facilitate detection and classification of diseases. Since most pathology diagnosis depends on the expertise and ability of the pathologist, there is clearly a need for an automated assessment system. Histological images are stained to a specific color to differentiate each component in the tissue. Segmentation and analysis of such images is problematic, as they present high variability in terms of color and cell clusters. This paper presents an adaptive thresholding technique that aims at segmenting cell structures from Haematoxylin and Eosin stained images. The thresholded result can further be used by pathologists to perform effective diagnosis. The effectiveness of the proposed method is analyzed by visually comparing the results to the state of art thresholding methods such as Otsu, Niblack, Sauvola, Bernsen, and Wolf. Computer simulations demonstrate the efficiency of the proposed method in segmenting critical information.

X-ray Micro-Tomography of Ablative Heat Shield Materials

NASA Technical Reports Server (NTRS)

Panerai, Francesco; Ferguson, Joseph; Borner, Arnaud; Mansour, Nagi N.; Barnard, Harold S.; MacDowell, Alastair A.; Parkinson, Dilworth Y.

2016-01-01

X-ray micro-tomography is a non-destructive characterization technique that allows imaging of materials structures with voxel sizes in the micrometer range. This level of resolution makes the technique very attractive for imaging porous ablators used in hypersonic entry systems. Besides providing a high fidelity description of the material architecture, micro-tomography enables computations of bulk material properties and simulations of micro-scale phenomena. This presentation provides an overview of a collaborative effort between NASA Ames Research Center and Lawrence Berkeley National Laboratory, aimed at developing micro-tomography experiments and simulations for porous ablative materials. Measurements are carried using x-rays from the Advanced Light Source at Berkeley Lab on different classes of ablative materials used in NASA entry systems. Challenges, strengths and limitations of the technique for imaging materials such as lightweight carbon-phenolic systems and woven textiles are discussed. Computational tools developed to perform numerical simulations based on micro-tomography are described. These enable computations of material properties such as permeability, thermal and radiative conductivity, tortuosity and other parameters that are used in ablator response models. Finally, we present the design of environmental cells that enable imaging materials under simulated operational conditions, such as high temperature, mechanical loads and oxidizing atmospheres.Keywords: Micro-tomography, Porous media, Ablation
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

PubMed Central

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.

2011-01-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.

PubMed

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W

2012-06-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Computation of physiological human vocal fold parameters by mathematical optimization of a biomechanical model

PubMed Central

Yang, Anxiong; Stingl, Michael; Berry, David A.; Lohscheller, Jörg; Voigt, Daniel; Eysholdt, Ulrich; Döllinger, Michael

2011-01-01

With the use of an endoscopic, high-speed camera, vocal fold dynamics may be observed clinically during phonation. However, observation and subjective judgment alone may be insufficient for clinical diagnosis and documentation of improved vocal function, especially when the laryngeal disease lacks any clear morphological presentation. In this study, biomechanical parameters of the vocal folds are computed by adjusting the corresponding parameters of a three-dimensional model until the dynamics of both systems are similar. First, a mathematical optimization method is presented. Next, model parameters (such as pressure, tension and masses) are adjusted to reproduce vocal fold dynamics, and the deduced parameters are physiologically interpreted. Various combinations of global and local optimization techniques are attempted. Evaluation of the optimization procedure is performed using 50 synthetically generated data sets. The results show sufficient reliability, including 0.07 normalized error, 96% correlation, and 91% accuracy. The technique is also demonstrated on data from human hemilarynx experiments, in which a low normalized error (0.16) and high correlation (84%) values were achieved. In the future, this technique may be applied to clinical high-speed images, yielding objective measures with which to document improved vocal function of patients with voice disorders. PMID:21877808
Computational Environments and Analysis methods available on the NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Foster, C.; Minchin, S. A.; Pugh, T.; Lewis, A.; Wyborn, L. A.; Evans, B. J.; Uhlherr, A.

2014-12-01

The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections. NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.
Fingerprinting Communication and Computation on HPC Machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peisert, Sean

2010-06-02

How do we identify what is actually running on high-performance computing systems? Names of binaries, dynamic libraries loaded, or other elements in a submission to a batch queue can give clues, but binary names can be changed, and libraries provide limited insight and resolution on the code being run. In this paper, we present a method for"fingerprinting" code running on HPC machines using elements of communication and computation. We then discuss how that fingerprint can be used to determine if the code is consistent with certain other types of codes, what a user usually runs, or what the user requestedmore » an allocation to do. In some cases, our techniques enable us to fingerprint HPC codes using runtime MPI data with a high degree of accuracy.« less
Application of X-Ray Computer Tomography for Observing the Central Void Formations and the Fuel Pin Deformations of Irradiated FBR Fuel Assemblies

NASA Astrophysics Data System (ADS)

Katsuyama, Kozo; Nagamine, Tsuyoshi; Furuya, Hirotaka

2010-10-01

In order to observe the structural change in the interior of irradiated fuel assemblies, a non-destructive post-irradiation examination (PIE) technique using X-ray computer tomography (X-ray CT) was developed. This X-ray CT technique was applied to observe the central void formations and fuel pin deformations of fuel assemblies which had been irradiated at high linear heat rating. The central void sizes in all fuel pins were measured on five cross sections of the core fuel column as a parameter for evaluating fuel thermal performance. In addition, the fuel pin deformations were analyzed from X-ray CT images obtained along the axial direction of a fuel assembly at the same separation interval. A dependence of void size on the linear heat rating was seen in the fuel assembly irradiated at high linear heat rating. In addition, significant undulations of the fuel pin were observed along the axial direction, coinciding with the wrapping wire pitch in the core fuel column. Application of the developed technique should provide enhanced resolution of measurements and simplify fuel PIEs.
CORDIC-based digital signal processing (DSP) element for adaptive signal processing

NASA Astrophysics Data System (ADS)

Bolstad, Gregory D.; Neeld, Kenneth B.

1995-04-01

The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
Application of Adjoint Method and Spectral-Element Method to Tomographic Inversion of Regional Seismological Structure Beneath Japanese Islands

NASA Astrophysics Data System (ADS)

Tsuboi, S.; Miyoshi, T.; Obayashi, M.; Tono, Y.; Ando, K.

2014-12-01

Recent progress in large scale computing by using waveform modeling technique and high performance computing facility has demonstrated possibilities to perform full-waveform inversion of three dimensional (3D) seismological structure inside the Earth. We apply the adjoint method (Liu and Tromp, 2006) to obtain 3D structure beneath Japanese Islands. First we implemented Spectral-Element Method to K-computer in Kobe, Japan. We have optimized SPECFEM3D_GLOBE (Komatitsch and Tromp, 2002) by using OpenMP so that the code fits hybrid architecture of K-computer. Now we could use 82,134 nodes of K-computer (657,072 cores) to compute synthetic waveform with about 1 sec accuracy for realistic 3D Earth model and its performance was 1.2 PFLOPS. We use this optimized SPECFEM3D_GLOBE code and take one chunk around Japanese Islands from global mesh and compute synthetic seismograms with accuracy of about 10 second. We use GAP-P2 mantle tomography model (Obayashi et al., 2009) as an initial 3D model and use as many broadband seismic stations available in this region as possible to perform inversion. We then use the time windows for body waves and surface waves to compute adjoint sources and calculate adjoint kernels for seismic structure. We have performed several iteration and obtained improved 3D structure beneath Japanese Islands. The result demonstrates that waveform misfits between observed and theoretical seismograms improves as the iteration proceeds. We now prepare to use much shorter period in our synthetic waveform computation and try to obtain seismic structure for basin scale model, such as Kanto basin, where there are dense seismic network and high seismic activity. Acknowledgements: This research was partly supported by MEXT Strategic Program for Innovative Research. We used F-net seismograms of the National Research Institute for Earth Science and Disaster Prevention.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

NASA Technical Reports Server (NTRS)

Katz, Randy H.; Anderson, Thomas E.; Ousterhout, John K.; Patterson, David A.

1991-01-01

Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications.
Code Modernization of VPIC

NASA Astrophysics Data System (ADS)

Bird, Robert; Nystrom, David; Albright, Brian

2017-10-01

The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
UNIX-based operating systems robustness evaluation

NASA Technical Reports Server (NTRS)

Chang, Yu-Ming

1996-01-01

Robust operating systems are required for reliable computing. Techniques for robustness evaluation of operating systems not only enhance the understanding of the reliability of computer systems, but also provide valuable feed- back to system designers. This thesis presents results from robustness evaluation experiments on five UNIX-based operating systems, which include Digital Equipment's OSF/l, Hewlett Packard's HP-UX, Sun Microsystems' Solaris and SunOS, and Silicon Graphics' IRIX. Three sets of experiments were performed. The methodology for evaluation tested (1) the exception handling mechanism, (2) system resource management, and (3) system capacity under high workload stress. An exception generator was used to evaluate the exception handling mechanism of the operating systems. Results included exit status of the exception generator and the system state. Resource management techniques used by individual operating systems were tested using programs designed to usurp system resources such as physical memory and process slots. Finally, the workload stress testing evaluated the effect of the workload on system performance by running a synthetic workload and recording the response time of local and remote user requests. Moderate to severe performance degradations were observed on the systems under stress.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Comparative analysis of techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Hitt, E. F.; Bridgman, M. S.; Robinson, A. C.

1981-01-01

Performability analysis is a technique developed for evaluating the effectiveness of fault-tolerant computing systems in multiphase missions. Performability was evaluated for its accuracy, practical usefulness, and relative cost. The evaluation was performed by applying performability and the fault tree method to a set of sample problems ranging from simple to moderately complex. The problems involved as many as five outcomes, two to five mission phases, permanent faults, and some functional dependencies. Transient faults and software errors were not considered. A different analyst was responsible for each technique. Significantly more time and effort were required to learn performability analysis than the fault tree method. Performability is inherently as accurate as fault tree analysis. For the sample problems, fault trees were more practical and less time consuming to apply, while performability required less ingenuity and was more checkable. Performability offers some advantages for evaluating very complex problems.
High-Performance Computing User Facility | Computational Science | NREL

Science.gov Websites

User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
A Stochastic Spiking Neural Network for Virtual Screening.

PubMed

Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L

2018-04-01

Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
Non-Invasive Transcranial Brain Therapy Guided by CT Scans: an In Vivo Monkey Study

NASA Astrophysics Data System (ADS)

Marquet, F.; Pernot, M.; Aubry, J.-F.; Montaldo, G.; Tanter, M.; Boch, A.-L.; Kujas, M.; Seilhean, D.; Fink, M.

2007-05-01

Brain therapy using focused ultrasound remains very limited due to the strong aberrations induced by the skull. A minimally invasive technique using time-reversal was validated recently in-vivo on 20 sheeps. But such a technique requires a hydrophone at the focal point for the first step of the time-reversal procedure. A completely noninvasive therapy requires a reliable model of the acoustic properties of the skull in order to simulate this first step. 3-D simulations based on high-resolution CT images of a skull have been successfully performed with a finite differences code developed in our Laboratory. Thanks to the skull porosity, directly extracted from the CT images, we reconstructed acoustic speed, density and absorption maps and performed the computation. Computed wavefronts are in good agreement with experimental wavefronts acquired through the same part of the skull and this technique was validated in-vitro in the laboratory. A stereotactic frame has been designed and built in order to perform non invasive transcranial focusing in vivo. Here we describe all the steps of our new protocol, from the CT-scans to the therapy treatment and the first in vivo results on a monkey will be presented. This protocol is based on protocols already existing in radiotherapy.
Large-scale inverse model analyses employing fast randomized data reduction

NASA Astrophysics Data System (ADS)

Lin, Youzuo; Le, Ellen B.; O'Malley, Daniel; Vesselinov, Velimir V.; Bui-Thanh, Tan

2017-08-01

When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called "sketching" matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.
High Performance Computing Meets Energy Efficiency - Continuum Magazine |

Science.gov Websites

NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data

Merlin - Massively parallel heterogeneous computing

NASA Technical Reports Server (NTRS)

Wittie, Larry; Maples, Creve

1989-01-01

Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing

NASA Astrophysics Data System (ADS)

Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen

2018-04-01

The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
Asynchronous Replica Exchange Software for Grid and Heterogeneous Computing.

PubMed

Gallicchio, Emilio; Xia, Junchao; Flynn, William F; Zhang, Baofeng; Samlalsingh, Sade; Mentes, Ahmet; Levy, Ronald M

2015-11-01

Parallel replica exchange sampling is an extended ensemble technique often used to accelerate the exploration of the conformational ensemble of atomistic molecular simulations of chemical systems. Inter-process communication and coordination requirements have historically discouraged the deployment of replica exchange on distributed and heterogeneous resources. Here we describe the architecture of a software (named ASyncRE) for performing asynchronous replica exchange molecular simulations on volunteered computing grids and heterogeneous high performance clusters. The asynchronous replica exchange algorithm on which the software is based avoids centralized synchronization steps and the need for direct communication between remote processes. It allows molecular dynamics threads to progress at different rates and enables parameter exchanges among arbitrary sets of replicas independently from other replicas. ASyncRE is written in Python following a modular design conducive to extensions to various replica exchange schemes and molecular dynamics engines. Applications of the software for the modeling of association equilibria of supramolecular and macromolecular complexes on BOINC campus computational grids and on the CPU/MIC heterogeneous hardware of the XSEDE Stampede supercomputer are illustrated. They show the ability of ASyncRE to utilize large grids of desktop computers running the Windows, MacOS, and/or Linux operating systems as well as collections of high performance heterogeneous hardware devices.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Brun, E., E-mail: emmanuel.brun@esrf.fr; Grandl, S.; Sztrókay-Gaul, A.

Purpose: Phase contrast computed tomography has emerged as an imaging method, which is able to outperform present day clinical mammography in breast tumor visualization while maintaining an equivalent average dose. To this day, no segmentation technique takes into account the specificity of the phase contrast signal. In this study, the authors propose a new mathematical framework for human-guided breast tumor segmentation. This method has been applied to high-resolution images of excised human organs, each of several gigabytes. Methods: The authors present a segmentation procedure based on the viscous watershed transform and demonstrate the efficacy of this method on analyzer basedmore » phase contrast images. The segmentation of tumors inside two full human breasts is then shown as an example of this procedure’s possible applications. Results: A correct and precise identification of the tumor boundaries was obtained and confirmed by manual contouring performed independently by four experienced radiologists. Conclusions: The authors demonstrate that applying the watershed viscous transform allows them to perform the segmentation of tumors in high-resolution x-ray analyzer based phase contrast breast computed tomography images. Combining the additional information provided by the segmentation procedure with the already high definition of morphological details and tissue boundaries offered by phase contrast imaging techniques, will represent a valuable multistep procedure to be used in future medical diagnostic applications.« less
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application

NASA Technical Reports Server (NTRS)

Liu, Zhuo; Wang, Bin; Wang, Teng; Tian, Yuan; Xu, Cong; Wang, Yandong; Yu, Weikuan; Cruz, Carlos A.; Zhou, Shujia; Clune, Tom;

2013-01-01

Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation.

Computer-generated holograms by multiple wavefront recording plane method with occlusion culling.

PubMed

Symeonidou, Athanasia; Blinder, David; Munteanu, Adrian; Schelkens, Peter

2015-08-24

We propose a novel fast method for full parallax computer-generated holograms with occlusion processing, suitable for volumetric data such as point clouds. A novel light wave propagation strategy relying on the sequential use of the wavefront recording plane method is proposed, which employs look-up tables in order to reduce the computational complexity in the calculation of the fields. Also, a novel technique for occlusion culling with little additional computation cost is introduced. Additionally, the method adheres a Gaussian distribution to the individual points in order to improve visual quality. Performance tests show that for a full-parallax high-definition CGH a speedup factor of more than 2,500 compared to the ray-tracing method can be achieved without hardware acceleration.
Brain tumor image segmentation using kernel dictionary learning.

PubMed

Jeon Lee; Seung-Jun Kim; Rong Chen; Herskovits, Edward H

2015-08-01

Automated brain tumor image segmentation with high accuracy and reproducibility holds a big potential to enhance the current clinical practice. Dictionary learning (DL) techniques have been applied successfully to various image processing tasks recently. In this work, kernel extensions of the DL approach are adopted. Both reconstructive and discriminative versions of the kernel DL technique are considered, which can efficiently incorporate multi-modal nonlinear feature mappings based on the kernel trick. Our novel discriminative kernel DL formulation allows joint learning of a task-driven kernel-based dictionary and a linear classifier using a K-SVD-type algorithm. The proposed approaches were tested using real brain magnetic resonance (MR) images of patients with high-grade glioma. The obtained preliminary performances are competitive with the state of the art. The discriminative kernel DL approach is seen to reduce computational burden without much sacrifice in performance.
Computer Simulation For Design Of TWT's

NASA Technical Reports Server (NTRS)

Bartos, Karen F.; Fite, E. Brian; Shalkhauser, Kurt A.; Sharp, G. Richard

1992-01-01

A three-dimensional finite-element analytical technique facilitates design and fabrication of traveling-wave-tube (TWT) slow-wave structures. Used to perform thermal and mechanical analyses of TWT designed with variety of configurations, geometries, and materials. Using three-dimensional computer analysis, designer able to simulate building and testing of TWT, with consequent substantial saving of time and money. Technique enables detailed look into operation of traveling-wave tubes to help improve performance for future communications systems.
A supportive architecture for CFD-based design optimisation

NASA Astrophysics Data System (ADS)

Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong

2014-03-01

Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture and developed algorithms have performed successfully and efficiently in dealing with the design optimisation with over 200 design variables.
Using Performance Tools to Support Experiments in HPC Resilience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naughton, III, Thomas J; Boehm, Swen; Engelmann, Christian

2014-01-01

The high performance computing (HPC) community is working to address fault tolerance and resilience concerns for current and future large scale computing platforms. This is driving enhancements in the programming environ- ments, specifically research on enhancing message passing libraries to support fault tolerant computing capabilities. The community has also recognized that tools for resilience experimentation are greatly lacking. However, we argue that there are several parallels between performance tools and resilience tools . As such, we believe the rich set of HPC performance-focused tools can be extended (repurposed) to benefit the resilience community. In this paper, we describe the initialmore » motivation to leverage standard HPC per- formance analysis techniques to aid in developing diagnostic tools to assist fault tolerance experiments for HPC applications. These diagnosis procedures help to provide context for the system when the errors (failures) occurred. We describe our initial work in leveraging an MPI performance trace tool to assist in provid- ing global context during fault injection experiments. Such tools will assist the HPC resilience community as they extend existing and new application codes to support fault tolerances.« less
Fast non-Abelian geometric gates via transitionless quantum driving.

PubMed

Zhang, J; Kyaw, Thi Ha; Tong, D M; Sjöqvist, Erik; Kwek, Leong-Chuan

2015-12-21

A practical quantum computer must be capable of performing high fidelity quantum gates on a set of quantum bits (qubits). In the presence of noise, the realization of such gates poses daunting challenges. Geometric phases, which possess intrinsic noise-tolerant features, hold the promise for performing robust quantum computation. In particular, quantum holonomies, i.e., non-Abelian geometric phases, naturally lead to universal quantum computation due to their non-commutativity. Although quantum gates based on adiabatic holonomies have already been proposed, the slow evolution eventually compromises qubit coherence and computational power. Here, we propose a general approach to speed up an implementation of adiabatic holonomic gates by using transitionless driving techniques and show how such a universal set of fast geometric quantum gates in a superconducting circuit architecture can be obtained in an all-geometric approach. Compared with standard non-adiabatic holonomic quantum computation, the holonomies obtained in our approach tends asymptotically to those of the adiabatic approach in the long run-time limit and thus might open up a new horizon for realizing a practical quantum computer.
Fast non-Abelian geometric gates via transitionless quantum driving

PubMed Central

Zhang, J.; Kyaw, Thi Ha; Tong, D. M.; Sjöqvist, Erik; Kwek, Leong-Chuan

2015-01-01

A practical quantum computer must be capable of performing high fidelity quantum gates on a set of quantum bits (qubits). In the presence of noise, the realization of such gates poses daunting challenges. Geometric phases, which possess intrinsic noise-tolerant features, hold the promise for performing robust quantum computation. In particular, quantum holonomies, i.e., non-Abelian geometric phases, naturally lead to universal quantum computation due to their non-commutativity. Although quantum gates based on adiabatic holonomies have already been proposed, the slow evolution eventually compromises qubit coherence and computational power. Here, we propose a general approach to speed up an implementation of adiabatic holonomic gates by using transitionless driving techniques and show how such a universal set of fast geometric quantum gates in a superconducting circuit architecture can be obtained in an all-geometric approach. Compared with standard non-adiabatic holonomic quantum computation, the holonomies obtained in our approach tends asymptotically to those of the adiabatic approach in the long run-time limit and thus might open up a new horizon for realizing a practical quantum computer. PMID:26687580
Computer Recreations.

ERIC Educational Resources Information Center

Dewdney, A. K.

1989-01-01

Reviews the performance of computer programs for writing poetry and prose, including MARK V. SHANEY, MELL, POETRY GENERATOR, THUNDER THOUGHT, and ORPHEUS. Discusses the writing principles of the programs. Provides additional information on computer magnification techniques. (YP)
Computerized systems analysis and optimization of aircraft engine performance, weight, and life cycle costs

NASA Technical Reports Server (NTRS)

Fishbach, L. H.

1979-01-01

The computational techniques utilized to determine the optimum propulsion systems for future aircraft applications and to identify system tradeoffs and technology requirements are described. The characteristics and use of the following computer codes are discussed: (1) NNEP - a very general cycle analysis code that can assemble an arbitrary matrix fans, turbines, ducts, shafts, etc., into a complete gas turbine engine and compute on- and off-design thermodynamic performance; (2) WATE - a preliminary design procedure for calculating engine weight using the component characteristics determined by NNEP; (3) POD DRG - a table look-up program to calculate wave and friction drag of nacelles; (4) LIFCYC - a computer code developed to calculate life cycle costs of engines based on the output from WATE; and (5) INSTAL - a computer code developed to calculate installation effects, inlet performance and inlet weight. Examples are given to illustrate how these computer techniques can be applied to analyze and optimize propulsion system fuel consumption, weight, and cost for representative types of aircraft and missions.
A tool for modeling concurrent real-time computation

NASA Technical Reports Server (NTRS)

Sharma, D. D.; Huang, Shie-Rei; Bhatt, Rahul; Sridharan, N. S.

1990-01-01

Real-time computation is a significant area of research in general, and in AI in particular. The complexity of practical real-time problems demands use of knowledge-based problem solving techniques while satisfying real-time performance constraints. Since the demands of a complex real-time problem cannot be predicted (owing to the dynamic nature of the environment) powerful dynamic resource control techniques are needed to monitor and control the performance. A real-time computation model for a real-time tool, an implementation of the QP-Net simulator on a Symbolics machine, and an implementation on a Butterfly multiprocessor machine are briefly described.
Models and techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1978-01-01

The development of system models that can provide a basis for the formulation and evaluation of aircraft computer system effectiveness, the formulation of quantitative measures of system effectiveness, and the development of analytic and simulation techniques for evaluating the effectiveness of a proposed or existing aircraft computer are described. Specific topics covered include: system models; performability evaluation; capability and functional dependence; computation of trajectory set probabilities; and hierarchical modeling of an air transport mission.
Generalized algebraic scene-based nonuniformity correction algorithm.

PubMed

Ratliff, Bradley M; Hayat, Majeed M; Tyo, J Scott

2005-02-01

A generalization of a recently developed algebraic scene-based nonuniformity correction algorithm for focal plane array (FPA) sensors is presented. The new technique uses pairs of image frames exhibiting arbitrary one- or two-dimensional translational motion to compute compensator quantities that are then used to remove nonuniformity in the bias of the FPA response. Unlike its predecessor, the generalization does not require the use of either a blackbody calibration target or a shutter. The algorithm has a low computational overhead, lending itself to real-time hardware implementation. The high-quality correction ability of this technique is demonstrated through application to real IR data from both cooled and uncooled infrared FPAs. A theoretical and experimental error analysis is performed to study the accuracy of the bias compensator estimates in the presence of two main sources of error.
An algorithmic framework for multiobjective optimization.

PubMed

Ganesan, T; Elamvazuthi, I; Shaari, Ku Zilati Ku; Vasant, P

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization.
An Algorithmic Framework for Multiobjective Optimization

PubMed Central

Ganesan, T.; Elamvazuthi, I.; Shaari, Ku Zilati Ku; Vasant, P.

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization. PMID:24470795
Coil Compression for Accelerated Imaging with Cartesian Sampling

PubMed Central

Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael

2012-01-01

MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589

Sensitivity Analysis for Coupled Aero-structural Systems

NASA Technical Reports Server (NTRS)

Giunta, Anthony A.

1999-01-01

A novel method has been developed for calculating gradients of aerodynamic force and moment coefficients for an aeroelastic aircraft model. This method uses the Global Sensitivity Equations (GSE) to account for the aero-structural coupling, and a reduced-order modal analysis approach to condense the coupling bandwidth between the aerodynamic and structural models. Parallel computing is applied to reduce the computational expense of the numerous high fidelity aerodynamic analyses needed for the coupled aero-structural system. Good agreement is obtained between aerodynamic force and moment gradients computed with the GSE/modal analysis approach and the same quantities computed using brute-force, computationally expensive, finite difference approximations. A comparison between the computational expense of the GSE/modal analysis method and a pure finite difference approach is presented. These results show that the GSE/modal analysis approach is the more computationally efficient technique if sensitivity analysis is to be performed for two or more aircraft design parameters.
Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

NASA Technical Reports Server (NTRS)

Chen, Paul Peichuan

1993-01-01

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.
Flow visualization of CFD using graphics workstations

NASA Technical Reports Server (NTRS)

Lasinski, Thomas; Buning, Pieter; Choi, Diana; Rogers, Stuart; Bancroft, Gordon

1987-01-01

High performance graphics workstations are used to visualize the fluid flow dynamics obtained from supercomputer solutions of computational fluid dynamic programs. The visualizations can be done independently on the workstation or while the workstation is connected to the supercomputer in a distributed computing mode. In the distributed mode, the supercomputer interactively performs the computationally intensive graphics rendering tasks while the workstation performs the viewing tasks. A major advantage of the workstations is that the viewers can interactively change their viewing position while watching the dynamics of the flow fields. An overview of the computer hardware and software required to create these displays is presented. For complex scenes the workstation cannot create the displays fast enough for good motion analysis. For these cases, the animation sequences are recorded on video tape or 16 mm film a frame at a time and played back at the desired speed. The additional software and hardware required to create these video tapes or 16 mm movies are also described. Photographs illustrating current visualization techniques are discussed. Examples of the use of the workstations for flow visualization through animation are available on video tape.
GPU Accelerated Vector Median Filter

NASA Technical Reports Server (NTRS)

Aras, Rifat; Shen, Yuzhong

2011-01-01

Noise reduction is an important step for most image processing tasks. For three channel color images, a widely used technique is vector median filter in which color values of pixels are treated as 3-component vectors. Vector median filters are computationally expensive; for a window size of n x n, each of the n(sup 2) vectors has to be compared with other n(sup 2) - 1 vectors in distances. General purpose computation on graphics processing units (GPUs) is the paradigm of utilizing high-performance many-core GPU architectures for computation tasks that are normally handled by CPUs. In this work. NVIDIA's Compute Unified Device Architecture (CUDA) paradigm is used to accelerate vector median filtering. which has to the best of our knowledge never been done before. The performance of GPU accelerated vector median filter is compared to that of the CPU and MPI-based versions for different image and window sizes, Initial findings of the study showed 100x improvement of performance of vector median filter implementation on GPUs over CPU implementations and further speed-up is expected after more extensive optimizations of the GPU algorithm .
High performance computing and communications program

NASA Technical Reports Server (NTRS)

Holcomb, Lee

1992-01-01

A review of the High Performance Computing and Communications (HPCC) program is provided in vugraph format. The goals and objectives of this federal program are as follows: extend U.S. leadership in high performance computing and computer communications; disseminate the technologies to speed innovation and to serve national goals; and spur gains in industrial competitiveness by making high performance computing integral to design and production.
Flight control systems development of highly maneuverable aircraft technology /HiMAT/ vehicle

NASA Technical Reports Server (NTRS)

Petersen, K. L.

1979-01-01

The highly maneuverable aircraft technology (HiMAT) program was conceived to demonstrate advanced technology concepts through scaled-aircraft flight tests using a remotely piloted technique. Closed-loop primary flight control is performed from a ground-based cockpit, utilizing a digital computer and up/down telemetry links. A backup flight control system for emergency operation resides in an onboard computer. The onboard systems are designed to provide fail-operational capabilities and utilize two microcomputers, dual uplink receiver/decoders, and redundant hydraulic actuation and power systems. This paper discusses the design and validation of the primary and backup digital flight control systems as well as the unique pilot and specialized systems interfaces.
Open-Source, Distributed Computational Environment for Virtual Materials Exploration

DTIC Science & Technology

2015-01-01

compromising structural integrity. For example, advanced designs could specify advanced materials processing techniques such as heat treatments in specific...orchestration of execution of multiple standalone codes at varying length scales will need advanced high ‐performance computing (HPC) integration in...possible hooks that could be used to coordinate larger workflows spanning tools developed by different groups. The high level approach explored
Stability-Constrained Aerodynamic Shape Optimization with Applications to Flying Wings

NASA Astrophysics Data System (ADS)

Mader, Charles Alexander

A set of techniques is developed that allows the incorporation of flight dynamics metrics as an additional discipline in a high-fidelity aerodynamic optimization. Specifically, techniques for including static stability constraints and handling qualities constraints in a high-fidelity aerodynamic optimization are demonstrated. These constraints are developed from stability derivative information calculated using high-fidelity computational fluid dynamics (CFD). Two techniques are explored for computing the stability derivatives from CFD. One technique uses an automatic differentiation adjoint technique (ADjoint) to efficiently and accurately compute a full set of static and dynamic stability derivatives from a single steady solution. The other technique uses a linear regression method to compute the stability derivatives from a quasi-unsteady time-spectral CFD solution, allowing for the computation of static, dynamic and transient stability derivatives. Based on the characteristics of the two methods, the time-spectral technique is selected for further development, incorporated into an optimization framework, and used to conduct stability-constrained aerodynamic optimization. This stability-constrained optimization framework is then used to conduct an optimization study of a flying wing configuration. This study shows that stability constraints have a significant impact on the optimal design of flying wings and that, while static stability constraints can often be satisfied by modifying the airfoil profiles of the wing, dynamic stability constraints can require a significant change in the planform of the aircraft in order for the constraints to be satisfied.
Shock and vibration technology with applications to electrical systems

NASA Technical Reports Server (NTRS)

Eshleman, R. L.

1972-01-01

A survey is presented of shock and vibration technology for electrical systems developed by the aerospace programs. The shock environment is surveyed along with new techniques for modeling, computer simulation, damping, and response analysis. Design techniques based on the use of analog computers, shock spectra, optimization, and nonlinear isolation are discussed. Shock mounting of rotors for performance and survival, and vibration isolation techniques are reviewed.
Effects of pupil filter patterns in line-scan focal modulation microscopy

NASA Astrophysics Data System (ADS)

Shen, Shuhao; Pant, Shilpa; Chen, Rui; Chen, Nanguang

2018-03-01

Line-scan focal modulation microscopy (LSFMM) is an emerging imaging technique that affords high imaging speed and good optical sectioning at the same time. We present a systematic investigation into optimal design of the pupil filter for LSFMM in an attempt to achieve the best performance in terms of spatial resolutions, optical sectioning, and modulation depth. Scalar diffraction theory was used to compute light propagation and distribution in the system and theoretical predictions on system performance, which were then compared with experimental results.
Non-local means denoising of dynamic PET images.

PubMed

Dutta, Joyita; Leahy, Richard M; Li, Quanzheng

2013-01-01

Dynamic positron emission tomography (PET), which reveals information about both the spatial distribution and temporal kinetics of a radiotracer, enables quantitative interpretation of PET data. Model-based interpretation of dynamic PET images by means of parametric fitting, however, is often a challenging task due to high levels of noise, thus necessitating a denoising step. The objective of this paper is to develop and characterize a denoising framework for dynamic PET based on non-local means (NLM). NLM denoising computes weighted averages of voxel intensities assigning larger weights to voxels that are similar to a given voxel in terms of their local neighborhoods or patches. We introduce three key modifications to tailor the original NLM framework to dynamic PET. Firstly, we derive similarities from less noisy later time points in a typical PET acquisition to denoise the entire time series. Secondly, we use spatiotemporal patches for robust similarity computation. Finally, we use a spatially varying smoothing parameter based on a local variance approximation over each spatiotemporal patch. To assess the performance of our denoising technique, we performed a realistic simulation on a dynamic digital phantom based on the Digimouse atlas. For experimental validation, we denoised [Formula: see text] PET images from a mouse study and a hepatocellular carcinoma patient study. We compared the performance of NLM denoising with four other denoising approaches - Gaussian filtering, PCA, HYPR, and conventional NLM based on spatial patches. The simulation study revealed significant improvement in bias-variance performance achieved using our NLM technique relative to all the other methods. The experimental data analysis revealed that our technique leads to clear improvement in contrast-to-noise ratio in Patlak parametric images generated from denoised preclinical and clinical dynamic images, indicating its ability to preserve image contrast and high intensity details while lowering the background noise variance.
An improved multilevel Monte Carlo method for estimating probability distribution functions in stochastic oil reservoir simulations

DOE PAGES

Lu, Dan; Zhang, Guannan; Webster, Clayton G.; ...

2016-12-30

In this paper, we develop an improved multilevel Monte Carlo (MLMC) method for estimating cumulative distribution functions (CDFs) of a quantity of interest, coming from numerical approximation of large-scale stochastic subsurface simulations. Compared with Monte Carlo (MC) methods, that require a significantly large number of high-fidelity model executions to achieve a prescribed accuracy when computing statistical expectations, MLMC methods were originally proposed to significantly reduce the computational cost with the use of multifidelity approximations. The improved performance of the MLMC methods depends strongly on the decay of the variance of the integrand as the level increases. However, the main challengemore » in estimating CDFs is that the integrand is a discontinuous indicator function whose variance decays slowly. To address this difficult task, we approximate the integrand using a smoothing function that accelerates the decay of the variance. In addition, we design a novel a posteriori optimization strategy to calibrate the smoothing function, so as to balance the computational gain and the approximation error. The combined proposed techniques are integrated into a very general and practical algorithm that can be applied to a wide range of subsurface problems for high-dimensional uncertainty quantification, such as a fine-grid oil reservoir model considered in this effort. The numerical results reveal that with the use of the calibrated smoothing function, the improved MLMC technique significantly reduces the computational complexity compared to the standard MC approach. Finally, we discuss several factors that affect the performance of the MLMC method and provide guidance for effective and efficient usage in practice.« less
Variable length adjacent partitioning for PTS based PAPR reduction of OFDM signal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibraheem, Zeyid T.; Rahman, Md. Mijanur; Yaakob, S. N.

2015-05-15

Peak-to-Average power ratio (PAPR) is a major drawback in OFDM communication. It leads the power amplifier into nonlinear region operation resulting into loss of data integrity. As such, there is a strong motivation to find techniques to reduce PAPR. Partial Transmit Sequence (PTS) is an attractive scheme for this purpose. Judicious partitioning the OFDM data frame into disjoint subsets is a pivotal component of any PTS scheme. Out of the existing partitioning techniques, adjacent partitioning is characterized by an attractive trade-off between cost and performance. With an aim of determining effects of length variability of adjacent partitions, we performed anmore » investigation into the performances of a variable length adjacent partitioning (VL-AP) and fixed length adjacent partitioning in comparison with other partitioning schemes such as pseudorandom partitioning. Simulation results with different modulation and partitioning scenarios showed that fixed length adjacent partition had better performance compared to variable length adjacent partitioning. As expected, simulation results showed a slightly better performance of pseudorandom partitioning technique compared to fixed and variable adjacent partitioning schemes. However, as the pseudorandom technique incurs high computational complexities, adjacent partitioning schemes were still seen as favorable candidates for PAPR reduction.« less
Iliac screw fixation using computer-assisted computer tomographic image guidance: technical note.

PubMed

Shin, John H; Hoh, Daniel J; Kalfas, Iain H

2012-03-01

Iliac screw fixation is a powerful tool used by spine surgeons to achieve fusion across the lumbosacral junction for a number of indications, including deformity, tumor, and pseudarthrosis. Complications associated with screw placement are related to blind trajectory selection and excessive soft tissue dissection. To describe the technique of iliac screw fixation using computed tomographic (CT)-based image guidance. Intraoperative registration and verification of anatomic landmarks are performed with the use of a preoperatively acquired CT of the lumbosacral spine. With the navigation probe, the ideal starting point for screw placement is selected while visualizing the intended trajectory and target on a computer screen. Once the starting point is selected and marked with a burr, a drill guide is docked within this point and the navigation probe re-inserted, confirming the trajectory. The probe is then removed and the high-speed drill reinserted within the drill guide. Drilling is performed to a depth measured on the computer screen and a screw is placed. Confirmation of accurate placement of iliac screws can be performed with standard radiographs. CT-guided navigation allows for 3-dimensional visualization of the pelvis and minimizes complications associated with soft-tissue dissection and breach of the ilium during screw placement.
Computational Science News | Computational Science | NREL

Science.gov Websites

-Cooled High-Performance Computing Technology at the ESIF February 28, 2018 NREL Launches New Website for High-Performance Computing System Users The National Renewable Energy Laboratory (NREL) Computational Science Center has launched a revamped website for users of the lab's high-performance computing (HPC
Synergia: an accelerator modeling tool with 3-D space charge

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amundson, James F.; Spentzouris, P.; /Fermilab

2004-07-01

High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification.

PubMed

Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

2015-01-01

Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
Simulation of the Effects of Cooling Techniques on Turbine Blade Heat Transfer

NASA Astrophysics Data System (ADS)

Shaw, Vince; Fatuzzo, Marco

Increases in the performance demands of turbo machinery has stimulated the development many new technologies over the last half century. With applications that spread beyond marine, aviation, and power generation, improvements in gas turbine technologies provide a vast impact. High temperatures within the combustion chamber of the gas turbine engine are known to cause an increase in thermal efficiency and power produced by the engine. However, since operating temperatures of these engines reach above 1000 K within the turbine section, the need for advances in material science and cooling techniques to produce functioning engines under these high thermal and dynamic stresses is crucial. As with all research and development, costs related to the production of prototypes can be reduced through the use of computational simulations. By making use of Ansys Simulation Software, the effects of turbine cooling techniques were analyzed. Simulation of the Effects of Cooling Techniques on Turbine Blade Heat Transfer.
On mechanics and material length scales of failure in heterogeneous interfaces using a finite strain high performance solver

NASA Astrophysics Data System (ADS)

Mosby, Matthew; Matouš, Karel

2015-12-01

Three-dimensional simulations capable of resolving the large range of spatial scales, from the failure-zone thickness up to the size of the representative unit cell, in damage mechanics problems of particle reinforced adhesives are presented. We show that resolving this wide range of scales in complex three-dimensional heterogeneous morphologies is essential in order to apprehend fracture characteristics, such as strength, fracture toughness and shape of the softening profile. Moreover, we show that computations that resolve essential physical length scales capture the particle size-effect in fracture toughness, for example. In the vein of image-based computational materials science, we construct statistically optimal unit cells containing hundreds to thousands of particles. We show that these statistically representative unit cells are capable of capturing the first- and second-order probability functions of a given data-source with better accuracy than traditional inclusion packing techniques. In order to accomplish these large computations, we use a parallel multiscale cohesive formulation and extend it to finite strains including damage mechanics. The high-performance parallel computational framework is executed on up to 1024 processing cores. A mesh convergence and a representative unit cell study are performed. Quantifying the complex damage patterns in simulations consisting of tens of millions of computational cells and millions of highly nonlinear equations requires data-mining the parallel simulations, and we propose two damage metrics to quantify the damage patterns. A detailed study of volume fraction and filler size on the macroscopic traction-separation response of heterogeneous adhesives is presented.
CPE--A New Perspective: The Impact of the Technology Revolution. Proceedings of the Computer Performance Evaluation Users Group Meeting (19th, San Francisco, California, October 25-28, 1983). Final Report. Reports on Computer Science and Technology.

ERIC Educational Resources Information Center

Mobray, Deborah, Ed.

Papers on local area networks (LANs), modelling techniques, software improvement, capacity planning, software engineering, microcomputers and end user computing, cost accounting and chargeback, configuration and performance management, and benchmarking presented at this conference include: (1) "Theoretical Performance Analysis of Virtual…

A Computer-Aided Instruction Program for Teaching the TOPS20-MM Facility on the DDN (Defense Data Network)

DTIC Science & Technology

1988-06-01

Continue on reverse if necessary and identify by block number) FIELD GROUP SUB-GROUP Computer Assisted Instruction; Artificial Intelligence 194...while he/she tries to perform given tasks. Means-ends analysis, a classic technique for solving search problems in Artificial Intelligence, has been...he/she tries to perform given tasks. Means-ends analysis, a classic technique for solving search problems in Artificial Intelligence, has been used
Many-core graph analytics using accelerated sparse linear algebra routines

NASA Astrophysics Data System (ADS)

Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric

2016-05-01

Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
Unstructured mesh adaptivity for urban flooding modelling

NASA Astrophysics Data System (ADS)

Hu, R.; Fang, F.; Salinas, P.; Pain, C. C.

2018-05-01

Over the past few decades, urban floods have been gaining more attention due to their increase in frequency. To provide reliable flooding predictions in urban areas, various numerical models have been developed to perform high-resolution flood simulations. However, the use of high-resolution meshes across the whole computational domain causes a high computational burden. In this paper, a 2D control-volume and finite-element flood model using adaptive unstructured mesh technology has been developed. This adaptive unstructured mesh technique enables meshes to be adapted optimally in time and space in response to the evolving flow features, thus providing sufficient mesh resolution where and when it is required. It has the advantage of capturing the details of local flows and wetting and drying front while reducing the computational cost. Complex topographic features are represented accurately during the flooding process. For example, the high-resolution meshes around the buildings and steep regions are placed when the flooding water reaches these regions. In this work a flooding event that happened in 2002 in Glasgow, Scotland, United Kingdom has been simulated to demonstrate the capability of the adaptive unstructured mesh flooding model. The simulations have been performed using both fixed and adaptive unstructured meshes, and then results have been compared with those published 2D and 3D results. The presented method shows that the 2D adaptive mesh model provides accurate results while having a low computational cost.
Secure Enclaves: An Isolation-centric Approach for Creating Secure High Performance Computing Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aderholdt, Ferrol; Caldwell, Blake A.; Hicks, Susan Elaine

High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data at various security levels but in so doing are often enclaved at the highest security posture. This approach places significant restrictions on the users of the system even when processing data at a lower security level and exposes data at higher levels of confidentiality to a much broader population than otherwise necessary. The traditional approach of isolation, while effective in establishing security enclaves poses significant challenges formore » the use of shared infrastructure in HPC environments. This report details current state-of-the-art in virtualization, reconfigurable network enclaving via Software Defined Networking (SDN), and storage architectures and bridging techniques for creating secure enclaves in HPC environments.« less
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.

1994-01-01

This research program deals with the application of high-performance computing methods for the analysis of complete jet engines. We have entitled this program by applying the two dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition, and solution capabilities were successfully tested. We then focused attention on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion that results from these structural displacements. This is treated by a new arbitrary Lagrangian-Eulerian (ALE) technique that models the fluid mesh motion as that of a fictitious mass-spring network. New partitioned analysis procedures to treat this coupled three-component problem are developed. These procedures involved delayed corrections and subcycling. Preliminary results on the stability, accuracy, and MPP computational efficiency are reported.
Evolutionary fuzzy modeling human diagnostic decisions.

PubMed

Peña-Reyes, Carlos Andrés

2004-05-01

Fuzzy CoCo is a methodology, combining fuzzy logic and evolutionary computation, for constructing systems able to accurately predict the outcome of a human decision-making process, while providing an understandable explanation of the underlying reasoning. Fuzzy logic provides a formal framework for constructing systems exhibiting both good numeric performance (accuracy) and linguistic representation (interpretability). However, fuzzy modeling--meaning the construction of fuzzy systems--is an arduous task, demanding the identification of many parameters. To solve it, we use evolutionary computation techniques (specifically cooperative coevolution), which are widely used to search for adequate solutions in complex spaces. We have successfully applied the algorithm to model the decision processes involved in two breast cancer diagnostic problems, the WBCD problem and the Catalonia mammography interpretation problem, obtaining systems both of high performance and high interpretability. For the Catalonia problem, an evolved system was embedded within a Web-based tool-called COBRA-for aiding radiologists in mammography interpretation.
High-resolution subject-specific mitral valve imaging and modeling: experimental and computational methods.

PubMed

Toma, Milan; Bloodworth, Charles H; Einstein, Daniel R; Pierce, Eric L; Cochran, Richard P; Yoganathan, Ajit P; Kunzelman, Karyn S

2016-12-01

The diversity of mitral valve (MV) geometries and multitude of surgical options for correction of MV diseases necessitates the use of computational modeling. Numerical simulations of the MV would allow surgeons and engineers to evaluate repairs, devices, procedures, and concepts before performing them and before moving on to more costly testing modalities. Constructing, tuning, and validating these models rely upon extensive in vitro characterization of valve structure, function, and response to change due to diseases. Micro-computed tomography ([Formula: see text]CT) allows for unmatched spatial resolution for soft tissue imaging. However, it is still technically challenging to obtain an accurate geometry of the diastolic MV. We discuss here the development of a novel technique for treating MV specimens with glutaraldehyde fixative in order to minimize geometric distortions in preparation for [Formula: see text]CT scanning. The technique provides a resulting MV geometry which is significantly more detailed in chordal structure, accurate in leaflet shape, and closer to its physiological diastolic geometry. In this paper, computational fluid-structure interaction (FSI) simulations are used to show the importance of more detailed subject-specific MV geometry with 3D chordal structure to simulate a proper closure validated against [Formula: see text]CT images of the closed valve. Two computational models, before and after use of the aforementioned technique, are used to simulate closure of the MV.
Automated validation of a computer operating system

NASA Technical Reports Server (NTRS)

Dervage, M. M.; Milberg, B. A.

1970-01-01

Programs apply selected input/output loads to complex computer operating system and measure performance of that system under such loads. Technique lends itself to checkout of computer software designed to monitor automated complex industrial systems.
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
The GeantV project: Preparing the future of simulation

DOE PAGES

Amadio, G.; J. Apostolakis; Bandieramonte, M.; ...

2015-12-23

Detector simulation is consuming at least half of the HEP computing cycles, and even so, experiments have to take hard decisions on what to simulate, as their needs greatly surpass the availability of computing resources. New experiments still in the design phase such as FCC, CLIC and ILC as well as upgraded versions of the existing LHC detectors will push further the simulation requirements. Since the increase in computing resources is not likely to keep pace with our needs, it is therefore necessary to explore innovative ways of speeding up simulation in order to sustain the progress of High Energymore » Physics. The GeantV project aims at developing a high performance detector simulation system integrating fast and full simulation that can be ported on different computing architectures, including CPU accelerators. After more than two years of R&D the project has produced a prototype capable of transporting particles in complex geometries exploiting micro-parallelism, SIMD and multithreading. Portability is obtained via C++ template techniques that allow the development of machine- independent computational kernels. Furthermore, a set of tables derived from Geant4 for cross sections and final states provides a realistic shower development and, having been ported into a Geant4 physics list, can be used as a basis for a direct performance comparison.« less
A new method for enhancer prediction based on deep belief network.

PubMed

Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong

2017-10-16

Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Modeling and Analysis of Power Processing Systems (MAPPS). Volume 1: Technical report

NASA Technical Reports Server (NTRS)

Lee, F. C.; Rahman, S.; Carter, R. A.; Wu, C. H.; Yu, Y.; Chang, R.

1980-01-01

Computer aided design and analysis techniques were applied to power processing equipment. Topics covered include: (1) discrete time domain analysis of switching regulators for performance analysis; (2) design optimization of power converters using augmented Lagrangian penalty function technique; (3) investigation of current-injected multiloop controlled switching regulators; and (4) application of optimization for Navy VSTOL energy power system. The generation of the mathematical models and the development and application of computer aided design techniques to solve the different mathematical models are discussed. Recommendations are made for future work that would enhance the application of the computer aided design techniques for power processing systems.
Student Achievement in Computer Programming: Lecture vs Computer-Aided Instruction

ERIC Educational Resources Information Center

Tsai, San-Yun W.; Pohl, Norval F.

1978-01-01

This paper discusses a study of the differences in student learning achievement, as measured by four different types of common performance evaluation techniques, in a college-level computer programming course under three teaching/learning environments: lecture, computer-aided instruction, and lecture supplemented with computer-aided instruction.…
Design and Evaluation of Fusion Approach for Combining Brain and Gaze Inputs for Target Selection

PubMed Central

Évain, Andéol; Argelaguet, Ferran; Casiez, Géry; Roussel, Nicolas; Lécuyer, Anatole

2016-01-01

Gaze-based interfaces and Brain-Computer Interfaces (BCIs) allow for hands-free human–computer interaction. In this paper, we investigate the combination of gaze and BCIs. We propose a novel selection technique for 2D target acquisition based on input fusion. This new approach combines the probabilistic models for each input, in order to better estimate the intent of the user. We evaluated its performance against the existing gaze and brain–computer interaction techniques. Twelve participants took part in our study, in which they had to search and select 2D targets with each of the evaluated techniques. Our fusion-based hybrid interaction technique was found to be more reliable than the previous gaze and BCI hybrid interaction techniques for 10 participants over 12, while being 29% faster on average. However, similarly to what has been observed in hybrid gaze-and-speech interaction, gaze-only interaction technique still provides the best performance. Our results should encourage the use of input fusion, as opposed to sequential interaction, in order to design better hybrid interfaces. PMID:27774048
High data volume and transfer rate techniques used at NASA's image processing facility

NASA Technical Reports Server (NTRS)

Heffner, P.; Connell, E.; Mccaleb, F.

1978-01-01

Data storage and transfer operations at a new image processing facility are described. The equipment includes high density digital magnetic tape drives and specially designed controllers to provide an interface between the tape drives and computerized image processing systems. The controller performs the functions necessary to convert the continuous serial data stream from the tape drive to a word-parallel blocked data stream which then goes to the computer-based system. With regard to the tape packing density, 1.8 times 10 to the tenth data bits are stored on a reel of one-inch tape. System components and their operation are surveyed, and studies on advanced storage techniques are summarized.
A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

NASA Astrophysics Data System (ADS)

Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

2014-12-01

Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Predicting the breakdown strength and lifetime of nanocomposites using a multi-scale modeling approach

NASA Astrophysics Data System (ADS)

Huang, Yanhui; Zhao, He; Wang, Yixing; Ratcliff, Tyree; Breneman, Curt; Brinson, L. Catherine; Chen, Wei; Schadler, Linda S.

2017-08-01

It has been found that doping dielectric polymers with a small amount of nanofiller or molecular additive can stabilize the material under a high field and lead to increased breakdown strength and lifetime. Choosing appropriate fillers is critical to optimizing the material performance, but current research largely relies on experimental trial and error. The employment of computer simulations for nanodielectric design is rarely reported. In this work, we propose a multi-scale modeling approach that employs ab initio, Monte Carlo, and continuum scales to predict the breakdown strength and lifetime of polymer nanocomposites based on the charge trapping effect of the nanofillers. The charge transfer, charge energy relaxation, and space charge effects are modeled in respective hierarchical scales by distinctive simulation techniques, and these models are connected together for high fidelity and robustness. The preliminary results show good agreement with the experimental data, suggesting its promise for use in the computer aided material design of high performance dielectrics.
Efficient scatter model for simulation of ultrasound images from computed tomography data

NASA Astrophysics Data System (ADS)

D'Amato, J. P.; Lo Vercio, L.; Rubi, P.; Fernandez Vera, E.; Barbuzza, R.; Del Fresno, M.; Larrabide, I.

2015-12-01

Background and motivation: Real-time ultrasound simulation refers to the process of computationally creating fully synthetic ultrasound images instantly. Due to the high value of specialized low cost training for healthcare professionals, there is a growing interest in the use of this technology and the development of high fidelity systems that simulate the acquisitions of echographic images. The objective is to create an efficient and reproducible simulator that can run either on notebooks or desktops using low cost devices. Materials and methods: We present an interactive ultrasound simulator based on CT data. This simulator is based on ray-casting and provides real-time interaction capabilities. The simulation of scattering that is coherent with the transducer position in real time is also introduced. Such noise is produced using a simplified model of multiplicative noise and convolution with point spread functions (PSF) tailored for this purpose. Results: The computational efficiency of scattering maps generation was revised with an improved performance. This allowed a more efficient simulation of coherent scattering in the synthetic echographic images while providing highly realistic result. We describe some quality and performance metrics to validate these results, where a performance of up to 55fps was achieved. Conclusion: The proposed technique for real-time scattering modeling provides realistic yet computationally efficient scatter distributions. The error between the original image and the simulated scattering image was compared for the proposed method and the state-of-the-art, showing negligible differences in its distribution.
HPC enabled real-time remote processing of laparoscopic surgery

NASA Astrophysics Data System (ADS)

Ronaghi, Zahra; Sapra, Karan; Izard, Ryan; Duffy, Edward; Smith, Melissa C.; Wang, Kuang-Ching; Kwartowitz, David M.

2016-03-01

Laparoscopic surgery is a minimally invasive surgical technique. The benefit of small incisions has a disadvantage of limited visualization of subsurface tissues. Image-guided surgery (IGS) uses pre-operative and intra-operative images to map subsurface structures. One particular laparoscopic system is the daVinci-si robotic surgical system. The video streams generate approximately 360 megabytes of data per second. Real-time processing this large stream of data on a bedside PC, single or dual node setup, has become challenging and a high-performance computing (HPC) environment may not always be available at the point of care. To process this data on remote HPC clusters at the typical 30 frames per second rate, it is required that each 11.9 MB video frame be processed by a server and returned within 1/30th of a second. We have implement and compared performance of compression, segmentation and registration algorithms on Clemson's Palmetto supercomputer using dual NVIDIA K40 GPUs per node. Our computing framework will also enable reliability using replication of computation. We will securely transfer the files to remote HPC clusters utilizing an OpenFlow-based network service, Steroid OpenFlow Service (SOS) that can increase performance of large data transfers over long-distance and high bandwidth networks. As a result, utilizing high-speed OpenFlow- based network to access computing clusters with GPUs will improve surgical procedures by providing real-time medical image processing and laparoscopic data.
Computational modes and the Machenauer N.L.N.M.I. of the GLAS 4th order model. [NonLinear Normal Mode Initialization in numerical weather forecasting

NASA Technical Reports Server (NTRS)

Navon, I. M.; Bloom, S.; Takacs, L. L.

1985-01-01

An attempt was made to use the GLAS global 4th order shallow water equations to perform a Machenhauer nonlinear normal mode initialization (NLNMI) for the external vertical mode. A new algorithm was defined for identifying and filtering out computational modes which affect the convergence of the Machenhauer iterative procedure. The computational modes and zonal waves were linearly initialized and gravitational modes were nonlinearly initialized. The Machenhauer NLNMI was insensitive to the absence of high zonal wave numbers. The effects of the Machenhauer scheme were evaluated by performing 24 hr integrations with nondissipative and dissipative explicit time integration models. The NLNMI was found to be inferior to the Rasch (1984) pseudo-secant technique for obtaining convergence when the time scales of nonlinear forcing were much smaller than the time scales expected from the natural frequency of the mode.

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

1987-01-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Computer architecture for efficient algorithmic executions in real-time systems: new technology for avionics systems and advanced space vehicles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carroll, C.C.; Youngblood, J.N.; Saha, A.

1987-12-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions

DTIC Science & Technology

2016-08-30

High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated
Models and techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1982-01-01

Models, measures, and techniques for evaluating the effectiveness of aircraft computing systems were developed. By "effectiveness" in this context we mean the extent to which the user, i.e., a commercial air carrier, may expect to benefit from the computational tasks accomplished by a computing system in the environment of an advanced commercial aircraft. Thus, the concept of effectiveness involves aspects of system performance, reliability, and worth (value, benefit) which are appropriately integrated in the process of evaluating system effectiveness. Specifically, the primary objectives are: the development of system models that provide a basis for the formulation and evaluation of aircraft computer system effectiveness, the formulation of quantitative measures of system effectiveness, and the development of analytic and simulation techniques for evaluating the effectiveness of a proposed or existing aircraft computer.
Adaptive polynomial chaos techniques for uncertainty quantification of a gas cooled fast reactor transient

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perko, Z.; Gilli, L.; Lathouwers, D.

2013-07-01

Uncertainty quantification plays an increasingly important role in the nuclear community, especially with the rise of Best Estimate Plus Uncertainty methodologies. Sensitivity analysis, surrogate models, Monte Carlo sampling and several other techniques can be used to propagate input uncertainties. In recent years however polynomial chaos expansion has become a popular alternative providing high accuracy at affordable computational cost. This paper presents such polynomial chaos (PC) methods using adaptive sparse grids and adaptive basis set construction, together with an application to a Gas Cooled Fast Reactor transient. Comparison is made between a new sparse grid algorithm and the traditionally used techniquemore » proposed by Gerstner. An adaptive basis construction method is also introduced and is proved to be advantageous both from an accuracy and a computational point of view. As a demonstration the uncertainty quantification of a 50% loss of flow transient in the GFR2400 Gas Cooled Fast Reactor design was performed using the CATHARE code system. The results are compared to direct Monte Carlo sampling and show the superior convergence and high accuracy of the polynomial chaos expansion. Since PC techniques are easy to implement, they can offer an attractive alternative to traditional techniques for the uncertainty quantification of large scale problems. (authors)« less
Evaluation of a Multicore-Optimized Implementation for Tomographic Reconstruction

PubMed Central

Agulleiro, Jose-Ignacio; Fernández, José Jesús

2012-01-01

Tomography allows elucidation of the three-dimensional structure of an object from a set of projection images. In life sciences, electron microscope tomography is providing invaluable information about the cell structure at a resolution of a few nanometres. Here, large images are required to combine wide fields of view with high resolution requirements. The computational complexity of the algorithms along with the large image size then turns tomographic reconstruction into a computationally demanding problem. Traditionally, high-performance computing techniques have been applied to cope with such demands on supercomputers, distributed systems and computer clusters. In the last few years, the trend has turned towards graphics processing units (GPUs). Here we present a detailed description and a thorough evaluation of an alternative approach that relies on exploitation of the power available in modern multicore computers. The combination of single-core code optimization, vector processing, multithreading and efficient disk I/O operations succeeds in providing fast tomographic reconstructions on standard computers. The approach turns out to be competitive with the fastest GPU-based solutions thus far. PMID:23139768
A perspective on future directions in aerospace propulsion system simulation

NASA Technical Reports Server (NTRS)

Miller, Brent A.; Szuch, John R.; Gaugler, Raymond E.; Wood, Jerry R.

1989-01-01

The design and development of aircraft engines is a lengthy and costly process using today's methodology. This is due, in large measure, to the fact that present methods rely heavily on experimental testing to verify the operability, performance, and structural integrity of components and systems. The potential exists for achieving significant speedups in the propulsion development process through increased use of computational techniques for simulation, analysis, and optimization. This paper outlines the concept and technology requirements for a Numerical Propulsion Simulation System (NPSS) that would provide capabilities to do interactive, multidisciplinary simulations of complete propulsion systems. By combining high performance computing hardware and software with state-of-the-art propulsion system models, the NPSS will permit the rapid calculation, assessment, and optimization of subcomponent, component, and system performance, durability, reliability and weight-before committing to building hardware.
X-ray computed tomography using curvelet sparse regularization.

PubMed

Wieczorek, Matthias; Frikel, Jürgen; Vogel, Jakob; Eggl, Elena; Kopp, Felix; Noël, Peter B; Pfeiffer, Franz; Demaret, Laurent; Lasser, Tobias

2015-04-01

Reconstruction of x-ray computed tomography (CT) data remains a mathematically challenging problem in medical imaging. Complementing the standard analytical reconstruction methods, sparse regularization is growing in importance, as it allows inclusion of prior knowledge. The paper presents a method for sparse regularization based on the curvelet frame for the application to iterative reconstruction in x-ray computed tomography. In this work, the authors present an iterative reconstruction approach based on the alternating direction method of multipliers using curvelet sparse regularization. Evaluation of the method is performed on a specifically crafted numerical phantom dataset to highlight the method's strengths. Additional evaluation is performed on two real datasets from commercial scanners with different noise characteristics, a clinical bone sample acquired in a micro-CT and a human abdomen scanned in a diagnostic CT. The results clearly illustrate that curvelet sparse regularization has characteristic strengths. In particular, it improves the restoration and resolution of highly directional, high contrast features with smooth contrast variations. The authors also compare this approach to the popular technique of total variation and to traditional filtered backprojection. The authors conclude that curvelet sparse regularization is able to improve reconstruction quality by reducing noise while preserving highly directional features.
Computer assisted alignment of opening wedge high tibial osteotomy provides limited improvement of radiographic outcomes compared to flouroscopic alignment.

PubMed

Stanley, Jeremy C; Robinson, Kerian G; Devitt, Brian M; Richmond, Anneka K; Webster, Kate E; Whitehead, Timothy S; Feller, Julian A

2016-03-01

There are numerous methods available to assist surgeons in the accurate correction of varus alignment during medial opening wedge high tibial osteotomy (MOWHTO). Preoperative planning performed with radiographs or more recently intraoperative computer navigation software has been used. The aim of the study was to compare the accuracy of computer navigated versus non-navigated techniques to correct varus alignment of the knee. The preoperative and postoperative radiographs of 117 knees that underwent MOWHTO were investigated to assess radiographic limb alignment 12-months postoperatively. The desired correction was defined as a weight bearing line (Mikulicz point {MP}) 58% of the width of the tibial plateau from the medial tibial margin. Sixty-five knees were corrected using a conventional technique and 52 knees were corrected using computer navigation. The mean MP percentage was 59% in the navigated group, compared with 56% in the fluoroscopic group (p=0.183). 51.9% of the navigation knees were corrected to within five percent of the desired correction, in contrast to 38.5% of the fluoroscopically corrected knees (p=0.15). 71.2% of the navigated knees were corrected to within 10% of the desired correction, compared with 63.1% of the fluoroscopically corrected knees (p=0.36). Large preoperative deformities were more accurately corrected with navigation assistance (57% vs 49%, p=0.049). No statistically significant difference was found in the radiographic correction of varus alignment twelve months postoperatively between navigated and fluoroscopic techniques of MOWHTO. However, a subgroup analysis demonstrated that larger preoperative varus deformities may be more accurately corrected using computer navigation. Copyright © 2016 Elsevier B.V. All rights reserved.
High-performance wavelet engine

NASA Astrophysics Data System (ADS)

Taylor, Fred J.; Mellot, Jonathon D.; Strom, Erik; Koren, Iztok; Lewis, Michael P.

1993-11-01

Wavelet processing has shown great promise for a variety of image and signal processing applications. Wavelets are also among the most computationally expensive techniques in signal processing. It is demonstrated that a wavelet engine constructed with residue number system arithmetic elements offers significant advantages over commercially available wavelet accelerators based upon conventional arithmetic elements. Analysis is presented predicting the dynamic range requirements of the reported residue number system based wavelet accelerator.
AHPCRC - Army High Performance Computing Research Center

DTIC Science & Technology

2010-01-01

shielding fabrics. Contact with a projectile induces electromagnetic forces on the fabric that can cause the projectile to rotate , making it less...other AHPCRC projects in need of optimization techniques. A major focus of this research addresses solving partial differential equation ( PDE ...plat- forms. One such problem is the determination of optimal wing shapes and motions. Work in progress involves coupling the PDE -solver AERO-F and
Computerized systems analysis and optimization of aircraft engine performance, weight, and life cycle costs

NASA Technical Reports Server (NTRS)

Fishbach, L. H.

1980-01-01

The computational techniques are described which are utilized at Lewis Research Center to determine the optimum propulsion systems for future aircraft applications and to identify system tradeoffs and technology requirements. Cycle performance, and engine weight can be calculated along with costs and installation effects as opposed to fuel consumption alone. Almost any conceivable turbine engine cycle can be studied. These computer codes are: NNEP, WATE, LIFCYC, INSTAL, and POD DRG. Examples are given to illustrate how these computer techniques can be applied to analyze and optimize propulsion system fuel consumption, weight and cost for representative types of aircraft and missions.
Automated Quantification of Pneumothorax in CT

PubMed Central

Do, Synho; Salvaggio, Kristen; Gupta, Supriya; Kalra, Mannudeep; Ali, Nabeel U.; Pien, Homer

2012-01-01

An automated, computer-aided diagnosis (CAD) algorithm for the quantification of pneumothoraces from Multidetector Computed Tomography (MDCT) images has been developed. Algorithm performance was evaluated through comparison to manual segmentation by expert radiologists. A combination of two-dimensional and three-dimensional processing techniques was incorporated to reduce required processing time by two-thirds (as compared to similar techniques). Volumetric measurements on relative pneumothorax size were obtained and the overall performance of the automated method shows an average error of just below 1%. PMID:23082091
Introducing Enabling Computational Tools to the Climate Sciences: Multi-Resolution Climate Modeling with Adaptive Cubed-Sphere Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jablonowski, Christiane

The research investigates and advances strategies how to bridge the scale discrepancies between local, regional and global phenomena in climate models without the prohibitive computational costs of global cloud-resolving simulations. In particular, the research explores new frontiers in computational geoscience by introducing high-order Adaptive Mesh Refinement (AMR) techniques into climate research. AMR and statically-adapted variable-resolution approaches represent an emerging trend for atmospheric models and are likely to become the new norm in future-generation weather and climate models. The research advances the understanding of multi-scale interactions in the climate system and showcases a pathway how to model these interactions effectively withmore » advanced computational tools, like the Chombo AMR library developed at the Lawrence Berkeley National Laboratory. The research is interdisciplinary and combines applied mathematics, scientific computing and the atmospheric sciences. In this research project, a hierarchy of high-order atmospheric models on cubed-sphere computational grids have been developed that serve as an algorithmic prototype for the finite-volume solution-adaptive Chombo-AMR approach. The foci of the investigations have lied on the characteristics of both static mesh adaptations and dynamically-adaptive grids that can capture flow fields of interest like tropical cyclones. Six research themes have been chosen. These are (1) the introduction of adaptive mesh refinement techniques into the climate sciences, (2) advanced algorithms for nonhydrostatic atmospheric dynamical cores, (3) an assessment of the interplay between resolved-scale dynamical motions and subgrid-scale physical parameterizations, (4) evaluation techniques for atmospheric model hierarchies, (5) the comparison of AMR refinement strategies and (6) tropical cyclone studies with a focus on multi-scale interactions and variable-resolution modeling. The results of this research project demonstrate significant advances in all six research areas. The major conclusions are that statically-adaptive variable-resolution modeling is currently becoming mature in the climate sciences, and that AMR holds outstanding promise for future-generation weather and climate models on high-performance computing architectures.« less
Sensing and Active Flow Control for Advanced BWB Propulsion-Airframe Integration Concepts

NASA Technical Reports Server (NTRS)

Fleming, John; Anderson, Jason; Ng, Wing; Harrison, Neal

2005-01-01

In order to realize the substantial performance benefits of serpentine boundary layer ingesting diffusers, this study investigated the use of enabling flow control methods to reduce engine-face flow distortion. Computational methods and novel flow control modeling techniques were utilized that allowed for rapid, accurate analysis of flow control geometries. Results were validated experimentally using the Techsburg Ejector-based wind tunnel facility; this facility is capable of simulating the high-altitude, high subsonic Mach number conditions representative of BWB cruise conditions.
USE OF COMPUTED TOMOGRAPHY FOR INVESTIGATION OF HEPATIC LIPIDOSIS IN CAPTIVE CHELONOIDIS CARBONARIA (SPIX, 1824).

PubMed

Marchiori, Adriano; da Silva, Ieverton Cleiton Correia; de Albuquerque Bonelli, Marília; de Albuquerque Zanotti, Luciana Carla Rameh; Siqueira, Daniel B; Zanotti, Alexandre Pinheiro; Costa, Fabiano Séllos

2015-06-01

Computed tomography is a sensitive and highly applicable technique for determining the degree of radiographic attenuation of the hepatic parenchyma. Radiodensity measurements of the liver can help in the diagnosis of hepatic lipidosis in humans and animals. The objective was to investigate the presence of hepatic lipidosis in captive red-footed tortoises (Chelonoidis carbonaria) using computed tomography. Computed tomography was performed in 10 male red-footed tortoises. Mean radiographic attenuation values for the hepatic parenchyma were 11.2±3.0 Hounsfield units (HU). Seven red-footed tortoises had values lower than 20 HU, which is compatible with C. carbonaria hepatic lipidosis. These results allowed an early diagnosis of the hepatic changes and suggested corrective measures regarding feeding and management protocols.
The tracking performance of distributed recoverable flight control systems subject to high intensity radiated fields

NASA Astrophysics Data System (ADS)

Wang, Rui

It is known that high intensity radiated fields (HIRF) can produce upsets in digital electronics, and thereby degrade the performance of digital flight control systems. Such upsets, either from natural or man-made sources, can change data values on digital buses and memory and affect CPU instruction execution. HIRF environments are also known to trigger common-mode faults, affecting nearly-simultaneously multiple fault containment regions, and hence reducing the benefits of n-modular redundancy and other fault-tolerant computing techniques. Thus, it is important to develop models which describe the integration of the embedded digital system, where the control law is implemented, as well as the dynamics of the closed-loop system. In this dissertation, theoretical tools are presented to analyze the relationship between the design choices for a class of distributed recoverable computing platforms and the tracking performance degradation of a digital flight control system implemented on such a platform while operating in a HIRF environment. Specifically, a tractable hybrid performance model is developed for a digital flight control system implemented on a computing platform inspired largely by the NASA family of fault-tolerant, reconfigurable computer architectures known as SPIDER (scalable processor-independent design for enhanced reliability). The focus will be on the SPIDER implementation, which uses the computer communication system known as ROBUS-2 (reliable optical bus). A physical HIRF experiment was conducted at the NASA Langley Research Center in order to validate the theoretical tracking performance degradation predictions for a distributed Boeing 747 flight control system subject to a HIRF environment. An extrapolation of these results for scenarios that could not be physically tested is also presented.
Reconfigurable Computing As an Enabling Technology for Single-Photon-Counting Laser Altimetry

NASA Technical Reports Server (NTRS)

Powell, Wesley; Hicks, Edward; Pinchinat, Maxime; Dabney, Philip; McGarry, Jan; Murray, Paul

2003-01-01

Single-photon-counting laser altimetry is a new measurement technique offering significant advantages in vertical resolution, reducing instrument size, mass, and power, and reducing laser complexity as compared to analog or threshold detection laser altimetry techniques. However, these improvements come at the cost of a dramatically increased requirement for onboard real-time data processing. Reconfigurable computing has been shown to offer considerable performance advantages in performing this processing. These advantages have been demonstrated on the Multi-KiloHertz Micro-Laser Altimeter (MMLA), an aircraft based single-photon-counting laser altimeter developed by NASA Goddard Space Flight Center with several potential spaceflight applications. This paper describes how reconfigurable computing technology was employed to perform MMLA data processing in real-time under realistic operating constraints, along with the results observed. This paper also expands on these prior results to identify concepts for using reconfigurable computing to enable spaceflight single-photon-counting laser altimeter instruments.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; et al.

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Analysis of high aspect ratio jet flap wings of arbitrary geometry.

NASA Technical Reports Server (NTRS)

Lissaman, P. B. S.

1973-01-01

Paper presents a design technique for rapidly computing lift, induced drag, and spanwise loading of unswept jet flap wings of arbitrary thickness, chord, twist, blowing, and jet angle, including discontinuities. Linear theory is used, extending Spence's method for elliptically loaded jet flap wings. Curves for uniformly blown rectangular wings are presented for direct performance estimation. Arbitrary planforms require a simple computer program. Method of reducing wing to equivalent stretched, twisted, unblown planform for hand calculation is also given. Results correlate with limited existing data, and show lifting line theory is reasonable down to aspect ratios of 5.

Computer-aided design studies of the homopolar linear synchronous motor

NASA Astrophysics Data System (ADS)

Dawson, G. E.; Eastham, A. R.; Ong, R.

1984-09-01

The linear induction motor (LIM), as an urban transit drive, can provide good grade-climbing capabilities and propulsion/braking performance that is independent of steel wheel-rail adhesion. In view of its 10-12 mm airgap, the LIM is characterized by a low power factor-efficiency product of order 0.4. A synchronous machine offers high efficiency and controllable power factor. An assessment of the linear homopolar configuration of this machine is presented as an alternative to the LIM. Computer-aided design studies using the finite element technique have been conducted to identify a suitable machine design for urban transit propulsion.
Rare event computation in deterministic chaotic systems using genealogical particle analysis

NASA Astrophysics Data System (ADS)

Wouters, J.; Bouchet, F.

2016-09-01

In this paper we address the use of rare event computation techniques to estimate small over-threshold probabilities of observables in deterministic dynamical systems. We demonstrate that genealogical particle analysis algorithms can be successfully applied to a toy model of atmospheric dynamics, the Lorenz ’96 model. We furthermore use the Ornstein-Uhlenbeck system to illustrate a number of implementation issues. We also show how a time-dependent objective function based on the fluctuation path to a high threshold can greatly improve the performance of the estimator compared to a fixed-in-time objective function.
Estimation of elastic moduli of graphene monolayer in lattice statics approach at nonzero temperature

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zubko, I. Yu., E-mail: zoubko@list.ru; Kochurov, V. I.

2015-10-27

For the aim of the crystal temperature control the computational-statistical approach to studying thermo-mechanical properties for finite sized crystals is presented. The approach is based on the combination of the high-performance computational techniques and statistical analysis of the crystal response on external thermo-mechanical actions for specimens with the statistically small amount of atoms (for instance, nanoparticles). The heat motion of atoms is imitated in the statics approach by including the independent degrees of freedom for atoms connected with their oscillations. We obtained that under heating, graphene material response is nonsymmetric.
Modular thermal analyzer routine, volume 1

NASA Technical Reports Server (NTRS)

Oren, J. A.; Phillips, M. A.; Williams, D. R.

1972-01-01

The Modular Thermal Analyzer Routine (MOTAR) is a general thermal analysis routine with strong capabilities for performing thermal analysis of systems containing flowing fluids, fluid system controls (valves, heat exchangers, etc.), life support systems, and thermal radiation situations. Its modular organization permits the analysis of a very wide range of thermal problems for simple problems containing a few conduction nodes to those containing complicated flow and radiation analysis with each problem type being analyzed with peak computational efficiency and maximum ease of use. The organization and programming methods applied to MOTAR achieved a high degree of computer utilization efficiency in terms of computer execution time and storage space required for a given problem. The computer time required to perform a given problem on MOTAR is approximately 40 to 50 percent that required for the currently existing widely used routines. The computer storage requirement for MOTAR is approximately 25 percent more than the most commonly used routines for the most simple problems but the data storage techniques for the more complicated options should save a considerable amount of space.
Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE

PubMed Central

Jamieson, Andrew R.; Giger, Maryellen L.; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha

2010-01-01

Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008)]. Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier’s AUC performance. Results: In the large U.S. data set, sample high performance results include, AUC0.632+=0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+=0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+=0.90 with interval [0.847;0.919], all using the MCMC-BANN. Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space. PMID:20175497
Computer program determines performance efficiency of remote measuring systems

NASA Technical Reports Server (NTRS)

Merewether, E. K.

1966-01-01

Computer programs control and evaluate instrumentation system performance for numerous rocket engine test facilities and prescribe calibration and maintenance techniques to maintain the systems within process specifications. Similar programs can be written for other test equipment in an industry such as the petrochemical industry.
Barrier-breaking performance for industrial problems on the CRAY C916

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graffunder, S.K.

1993-12-31

Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
Region of interest processing for iterative reconstruction in x-ray computed tomography

NASA Astrophysics Data System (ADS)

Kopp, Felix K.; Nasirudin, Radin A.; Mei, Kai; Fehringer, Andreas; Pfeiffer, Franz; Rummeny, Ernst J.; Noël, Peter B.

2015-03-01

The recent advancements in the graphics card technology raised the performance of parallel computing and contributed to the introduction of iterative reconstruction methods for x-ray computed tomography in clinical CT scanners. Iterative maximum likelihood (ML) based reconstruction methods are known to reduce image noise and to improve the diagnostic quality of low-dose CT. However, iterative reconstruction of a region of interest (ROI), especially ML based, is challenging. But for some clinical procedures, like cardiac CT, only a ROI is needed for diagnostics. A high-resolution reconstruction of the full field of view (FOV) consumes unnecessary computation effort that results in a slower reconstruction than clinically acceptable. In this work, we present an extension and evaluation of an existing ROI processing algorithm. Especially improvements for the equalization between regions inside and outside of a ROI are proposed. The evaluation was done on data collected from a clinical CT scanner. The performance of the different algorithms is qualitatively and quantitatively assessed. Our solution to the ROI problem provides an increase in signal-to-noise ratio and leads to visually less noise in the final reconstruction. The reconstruction speed of our technique was observed to be comparable with other previous proposed techniques. The development of ROI processing algorithms in combination with iterative reconstruction will provide higher diagnostic quality in the near future.
Analysis of high-aspect-ratio jet-flap wings of arbitrary geometry

NASA Technical Reports Server (NTRS)

Lissaman, P. B. S.

1973-01-01

An analytical technique to compute the performance of an arbitrary jet-flapped wing is developed. The solution technique is based on the method of Maskell and Spence in which the well-known lifting-line approach is coupled with an auxiliary equation providing the extra function needed in jet-flap theory. The present method is generalized to handle straight, uncambered wings of arbitrary planform, twist, and blowing (including unsymmetrical cases). An analytical procedure is developed for continuous variations in the above geometric data with special functions to exactly treat discontinuities in any of the geometric and blowing data. A rational theory for the effect of finite wing thickness is introduced as well as simplified concepts of effective aspect ratio for rapid estimation of performance.
Classified one-step high-radix signed-digit arithmetic units

NASA Astrophysics Data System (ADS)

Cherri, Abdallah K.

1998-08-01

High-radix number systems enable higher information storage density, less complexity, fewer system components, and fewer cascaded gates and operations. A simple one-step fully parallel high-radix signed-digit arithmetic is proposed for parallel optical computing based on new joint spatial encodings. This reduces hardware requirements and improves throughput by reducing the space-bandwidth produce needed. The high-radix signed-digit arithmetic operations are based on classifying the neighboring input digit pairs into various groups to reduce the computation rules. A new joint spatial encoding technique is developed to present both the operands and the computation rules. This technique increases the spatial bandwidth product of the spatial light modulators of the system. An optical implementation of the proposed high-radix signed-digit arithmetic operations is also presented. It is shown that our one-step trinary signed-digit and quaternary signed-digit arithmetic units are much simpler and better than all previously reported high-radix signed-digit techniques.
Grand Challenges: High Performance Computing and Communications. The FY 1992 U.S. Research and Development Program.

ERIC Educational Resources Information Center

Federal Coordinating Council for Science, Engineering and Technology, Washington, DC.

This report presents a review of the High Performance Computing and Communications (HPCC) Program, which has as its goal the acceleration of the commercial availability and utilization of the next generation of high performance computers and networks in order to: (1) extend U.S. technological leadership in high performance computing and computer…
Friction Stir Additive Manufacturing: Route to High Structural Performance

NASA Astrophysics Data System (ADS)

Palanivel, S.; Sidhar, H.; Mishra, R. S.

2015-03-01

Aerospace and automotive industries provide the next big opportunities for additive manufacturing. Currently, the additive industry is confronted with four major challenges that have been identified in this article. These challenges need to be addressed for the additive technologies to march into new frontiers and create additional markets. Specific potential success in the transportation sectors is dependent on the ability to manufacture complicated structures with high performance. Most of the techniques used for metal-based additive manufacturing are fusion based because of their ability to fulfill the computer-aided design to component vision. Although these techniques aid in fabrication of complex shapes, achieving high structural performance is a key problem due to the liquid-solid phase transformation. In this article, friction stir additive manufacturing (FSAM) is shown as a potential solid-state process for attaining high-performance lightweight alloys for simpler geometrical applications. To illustrate FSAM as a high-performance route, manufactured builds of Mg-4Y-3Nd and AA5083 are shown as examples. In the Mg-based alloy, an average hardness of 120 HV was achieved in the built structure and was significantly higher than that of the base material (97 HV). Similarly for the Al-based alloy, compared with the base hardness of 88 HV, the average built hardness was 104 HV. A potential application of FSAM is illustrated by taking an example of a simple stiffener assembly.
Real-time multiple human perception with color-depth cameras on a mobile robot.

PubMed

Zhang, Hao; Reardon, Christopher; Parker, Lynne E

2013-10-01

The ability to perceive humans is an essential requirement for safe and efficient human-robot interaction. In real-world applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability of commercial color-depth cameras allow for the creation of a system that makes use of the depth dimension, thus enabling a robot to observe its environment and perceive in the 3-D space. Here we present a system for 3-D multiple human perception in real time from a moving robot equipped with a color-depth camera and a consumer-grade computer. Our approach reduces computation time to achieve real-time performance through a unique combination of new ideas and established techniques. We remove the ground and ceiling planes from the 3-D point cloud input to separate candidate point clusters. We introduce the novel information concept, depth of interest, which we use to identify candidates for detection, and that avoids the computationally expensive scanning-window methods of other approaches. We utilize a cascade of detectors to distinguish humans from objects, in which we make intelligent reuse of intermediary features in successive detectors to improve computation. Because of the high computational cost of some methods, we represent our candidate tracking algorithm with a decision directed acyclic graph, which allows us to use the most computationally intense techniques only where necessary. We detail the successful implementation of our novel approach on a mobile robot and examine its performance in scenarios with real-world challenges, including occlusion, robot motion, nonupright humans, humans leaving and reentering the field of view (i.e., the reidentification challenge), human-object and human-human interaction. We conclude with the observation that the incorporation of the depth information, together with the use of modern techniques in new ways, we are able to create an accurate system for real-time 3-D perception of humans by a mobile robot.
Numerical method for predicting flow characteristics and performance of nonaxisymmetric nozzles, theory

NASA Technical Reports Server (NTRS)

Thomas, P. D.

1979-01-01

The theoretical foundation and formulation of a numerical method for predicting the viscous flowfield in and about isolated three dimensional nozzles of geometrically complex configuration are presented. High Reynolds number turbulent flows are of primary interest for any combination of subsonic, transonic, and supersonic flow conditions inside or outside the nozzle. An alternating-direction implicit (ADI) numerical technique is employed to integrate the unsteady Navier-Stokes equations until an asymptotic steady-state solution is reached. Boundary conditions are computed with an implicit technique compatible with the ADI technique employed at interior points of the flow region. The equations are formulated and solved in a boundary-conforming curvilinear coordinate system. The curvilinear coordinate system and computational grid is generated numerically as the solution to an elliptic boundary value problem. A method is developed that automatically adjusts the elliptic system so that the interior grid spacing is controlled directly by the a priori selection of the grid spacing on the boundaries of the flow region.
Automatic Beam Path Analysis of Laser Wakefield Particle Acceleration Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rubel, Oliver; Geddes, Cameron G.R.; Cormier-Michel, Estelle

2009-10-19

Numerical simulations of laser wakefield particle accelerators play a key role in the understanding of the complex acceleration process and in the design of expensive experimental facilities. As the size and complexity of simulation output grows, an increasingly acute challenge is the practical need for computational techniques that aid in scientific knowledge discovery. To that end, we present a set of data-understanding algorithms that work in concert in a pipeline fashion to automatically locate and analyze high energy particle bunches undergoing acceleration in very large simulation datasets. These techniques work cooperatively by first identifying features of interest in individual timesteps,more » then integrating features across timesteps, and based on the information derived perform analysis of temporally dynamic features. This combination of techniques supports accurate detection of particle beams enabling a deeper level of scientific understanding of physical phenomena than hasbeen possible before. By combining efficient data analysis algorithms and state-of-the-art data management we enable high-performance analysis of extremely large particle datasets in 3D. We demonstrate the usefulness of our methods for a variety of 2D and 3D datasets and discuss the performance of our analysis pipeline.« less
Looking inside the Ocean: Toward an Autonomous Imaging System for Monitoring Gelatinous Zooplankton

PubMed Central

Corgnati, Lorenzo; Marini, Simone; Mazzei, Luca; Ottaviani, Ennio; Aliani, Stefano; Conversi, Alessandra; Griffa, Annalisa

2016-01-01

Marine plankton abundance and dynamics in the open and interior ocean is still an unknown field. The knowledge of gelatinous zooplankton distribution is especially challenging, because this type of plankton has a very fragile structure and cannot be directly sampled using traditional net based techniques. To overcome this shortcoming, Computer Vision techniques can be successfully used for the automatic monitoring of this group.This paper presents the GUARD1 imaging system, a low-cost stand-alone instrument for underwater image acquisition and recognition of gelatinous zooplankton, and discusses the performance of three different methodologies, Tikhonov Regularization, Support Vector Machines and Genetic Programming, that have been compared in order to select the one to be run onboard the system for the automatic recognition of gelatinous zooplankton. The performance comparison results highlight the high accuracy of the three methods in gelatinous zooplankton identification, showing their good capability in robustly selecting relevant features. In particular, Genetic Programming technique achieves the same performances of the other two methods by using a smaller set of features, thus being the most efficient in avoiding computationally consuming preprocessing stages, that is a crucial requirement for running on an autonomous imaging system designed for long lasting deployments, like the GUARD1. The Genetic Programming algorithm has been installed onboard the system, that has been operationally tested in a two-months survey in the Ligurian Sea, providing satisfactory results in terms of monitoring and recognition performances. PMID:27983638
Development of seismic tomography software for hybrid supercomputers

NASA Astrophysics Data System (ADS)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Nektar++: An open-source spectral/ hp element framework

NASA Astrophysics Data System (ADS)

Cantwell, C. D.; Moxey, D.; Comerford, A.; Bolis, A.; Rocco, G.; Mengaldo, G.; De Grazia, D.; Yakovlev, S.; Lombard, J.-E.; Ekelschot, D.; Jordi, B.; Xu, H.; Mohamied, Y.; Eskilsson, C.; Nelson, B.; Vos, P.; Biotto, C.; Kirby, R. M.; Sherwin, S. J.

2015-07-01

Nektar++ is an open-source software framework designed to support the development of high-performance scalable solvers for partial differential equations using the spectral/ hp element method. High-order methods are gaining prominence in several engineering and biomedical applications due to their improved accuracy over low-order techniques at reduced computational cost for a given number of degrees of freedom. However, their proliferation is often limited by their complexity, which makes these methods challenging to implement and use. Nektar++ is an initiative to overcome this limitation by encapsulating the mathematical complexities of the underlying method within an efficient C++ framework, making the techniques more accessible to the broader scientific and industrial communities. The software supports a variety of discretisation techniques and implementation strategies, supporting methods research as well as application-focused computation, and the multi-layered structure of the framework allows the user to embrace as much or as little of the complexity as they need. The libraries capture the mathematical constructs of spectral/ hp element methods, while the associated collection of pre-written PDE solvers provides out-of-the-box application-level functionality and a template for users who wish to develop solutions for addressing questions in their own scientific domains.
Ray tracing on the MPP

NASA Technical Reports Server (NTRS)

Dorband, John E.

1987-01-01

Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Thermal Inspection of Composite Honeycomb Structures

NASA Technical Reports Server (NTRS)

Zalameda, Joseph N.; Parker, F. Raymond

2014-01-01

Composite honeycomb structures continue to be widely used in aerospace applications due to their low weight and high strength advantages. Developing nondestructive evaluation (NDE) inspection methods are essential for their safe performance. Pulsed thermography is a commonly used technique for composite honeycomb structure inspections due to its large area and rapid inspection capability. Pulsed thermography is shown to be sensitive for detection of face sheet impact damage and face sheet to core disbond. Data processing techniques, using principal component analysis to improve the defect contrast, are presented. In addition, limitations to the thermal detection of the core are investigated. Other NDE techniques, such as computed tomography X-ray and ultrasound, are used for comparison to the thermography results.

Scientific Visualization in High Speed Network Environments

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kutler, Paul (Technical Monitor)

1997-01-01

In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.
The application of rapid prototyping technique in chin augmentation.

PubMed

Li, Min; Lin, Xin; Xu, Yongchen

2010-04-01

This article discusses the application of computer-aided design and rapid prototyping techniques in prosthetic chin augmentation for mild microgenia. Nine cases of mild microgenia underwent an electrobeam computer tomography scan. Then we performed three-dimensional reconstruction and operative design using computer software. According to the design, we determined the shape and size of the prostheses and made an individualized prosthesis for each chin augmentation with the rapid prototyping technique. With the application of computer-aided design and a rapid prototyping technique, we could determine the shape, size, and embedding location accurately. Prefabricating the individual prosthesis model is useful in improving the accuracy of treatment. In the nine cases of mild microgenia, three received a silicone implant, four received an ePTFE implant, and two received a Medpor implant. All patients were satisfied with the results. During follow-up at 6-12 months, all patients remained satisfied. The application of computer-aided design and rapid prototyping techniques can offer surgeons the ability to design an individualized ideal prosthesis for each patient.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

PubMed

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

PubMed Central

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-01-01

Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Condor-COPASI: high-throughput computing for biochemical networks

PubMed Central

2012-01-01

Background Mathematical modelling has become a standard technique to improve our understanding of complex biological systems. As models become larger and more complex, simulations and analyses require increasing amounts of computational power. Clusters of computers in a high-throughput computing environment can help to provide the resources required for computationally expensive model analysis. However, exploiting such a system can be difficult for users without the necessary expertise. Results We present Condor-COPASI, a server-based software tool that integrates COPASI, a biological pathway simulation tool, with Condor, a high-throughput computing environment. Condor-COPASI provides a web-based interface, which makes it extremely easy for a user to run a number of model simulation and analysis tasks in parallel. Tasks are transparently split into smaller parts, and submitted for execution on a Condor pool. Result output is presented to the user in a number of formats, including tables and interactive graphical displays. Conclusions Condor-COPASI can effectively use a Condor high-throughput computing environment to provide significant gains in performance for a number of model simulation and analysis tasks. Condor-COPASI is free, open source software, released under the Artistic License 2.0, and is suitable for use by any institution with access to a Condor pool. Source code is freely available for download at http://code.google.com/p/condor-copasi/, along with full instructions on deployment and usage. PMID:22834945
Surpassing Humans and Computers with JellyBean: Crowd-Vision-Hybrid Counting Algorithms.

PubMed

Sarma, Akash Das; Jain, Ayush; Nandi, Arnab; Parameswaran, Aditya; Widom, Jennifer

2015-11-01

Counting objects is a fundamental image processisng primitive, and has many scientific, health, surveillance, security, and military applications. Existing supervised computer vision techniques typically require large quantities of labeled training data, and even with that, fail to return accurate results in all but the most stylized settings. Using vanilla crowd-sourcing, on the other hand, can lead to significant errors, especially on images with many objects. In this paper, we present our JellyBean suite of algorithms, that combines the best of crowds and computer vision to count objects in images, and uses judicious decomposition of images to greatly improve accuracy at low cost. Our algorithms have several desirable properties: (i) they are theoretically optimal or near-optimal , in that they ask as few questions as possible to humans (under certain intuitively reasonable assumptions that we justify in our paper experimentally); (ii) they operate under stand-alone or hybrid modes, in that they can either work independent of computer vision algorithms, or work in concert with them, depending on whether the computer vision techniques are available or useful for the given setting; (iii) they perform very well in practice, returning accurate counts on images that no individual worker or computer vision algorithm can count correctly, while not incurring a high cost.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Facilities | Integrated Energy Solutions | NREL

Science.gov Websites

strategies needed to optimize our entire energy system. A photo of the high-performance computer at NREL . High-Performance Computing Data Center High-performance computing facilities at NREL provide high-speed
Spatial Visualization--A Gateway to Computer-Based Technology.

ERIC Educational Resources Information Center

Norman, Kent L.

1994-01-01

A model is proposed for the influence of individual differences on performance when computer-based technology is introduced. The primary cognitive factor driving differences in performance is spatial visualization ability. Four techniques for mitigating the negative impact of low spatial visualization are discussed: spatial metaphors, graphical…
Dynamic model tracking design for low inertia, high speed permanent magnet ac motors.

PubMed

Stewart, P; Kadirkamanathan, V

2004-01-01

Permanent magnet ac (PMAC) motors have existed in various configurations for many years. The advent of rare-earth magnets and their associated highly elevated levels of magnetic flux makes the permanent magnet motor attractive for many high performance applications from computer disk drives to all electric racing cars. The use of batteries as a prime storage element carries a cost penalty in terms of the unladen weight of the vehicle. Minimizing this cost function requires the minimum electric motor size and weight to be specified, while still retaining acceptable levels of output torque. This tradeoff can be achieved by applying a technique known as flux weakening which will be investigated in this paper. The technique allows the speed range of a PMAC motor to be greatly increased, giving a constant power range of more than 4:1. A dynamic model reference controller is presented which has advantages in ease of implementation, and is particularly suited to dynamic low inertia applications such as clutchless gear changing in high performance electric vehicles. The benefits of this approach are to maximize the torque speed envelope of the motor, particularly advantageous when considering low inertia operation. The controller is examined experimentally, confirming the predicted performance.
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

PubMed

Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

2012-09-25

Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

PubMed Central

2012-01-01

Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
The Propagation and Scattering of EM Waves in Electrically Large Ducts

NASA Astrophysics Data System (ADS)

Khan, Saeed Mahmood

The electromagnetic scattering from large arbitrarily shaped ducts with complex termination is studied here by a hybrid technique. The propagation of electromagnetic waves in the duct is analyzed in terms of an approximate modal solution. A finite difference technique is employed for computing the reflection characteristics of the complex terminations. Both solutions are combined using the unimoment method. The analysis here is carried out for monostatic RCS and considers only fields backscattered from inside the cavity. Rim-diffraction has been left out. The procedure offers such advantages as in that it is not necessary to find complicated Green's functions, which may not be readily available, when compared with the integral equation method. Hybridization performed by combining an approximate modal technique with a finite difference one makes the scheme numerically efficient. From a computational EM point of view, it brings together a whole spectrum of techniques associated with high frequency modal analysis, Fourier Methods, Radar Cross Section and Scattering, finite difference solution and the Unimoment Method. The practical application of this technique may range from the study of RCS scattered from jet inlets of radar evasive aircraft to submarine communication waveguides.
Variable-Complexity Multidisciplinary Optimization on Parallel Computers

NASA Technical Reports Server (NTRS)

Grossman, Bernard; Mason, William H.; Watson, Layne T.; Haftka, Raphael T.

1998-01-01

This report covers work conducted under grant NAG1-1562 for the NASA High Performance Computing and Communications Program (HPCCP) from December 7, 1993, to December 31, 1997. The objective of the research was to develop new multidisciplinary design optimization (MDO) techniques which exploit parallel computing to reduce the computational burden of aircraft MDO. The design of the High-Speed Civil Transport (HSCT) air-craft was selected as a test case to demonstrate the utility of our MDO methods. The three major tasks of this research grant included: development of parallel multipoint approximation methods for the aerodynamic design of the HSCT, use of parallel multipoint approximation methods for structural optimization of the HSCT, mathematical and algorithmic development including support in the integration of parallel computation for items (1) and (2). These tasks have been accomplished with the development of a response surface methodology that incorporates multi-fidelity models. For the aerodynamic design we were able to optimize with up to 20 design variables using hundreds of expensive Euler analyses together with thousands of inexpensive linear theory simulations. We have thereby demonstrated the application of CFD to a large aerodynamic design problem. For the predicting structural weight we were able to combine hundreds of structural optimizations of refined finite element models with thousands of optimizations based on coarse models. Computations have been carried out on the Intel Paragon with up to 128 nodes. The parallel computation allowed us to perform combined aerodynamic-structural optimization using state of the art models of a complex aircraft configurations.
Computational aeroelasticity using a pressure-based solver

NASA Astrophysics Data System (ADS)

Kamakoti, Ramji

A computational methodology for performing fluid-structure interaction computations for three-dimensional elastic wing geometries is presented. The flow solver used is based on an unsteady Reynolds-Averaged Navier-Stokes (RANS) model. A well validated k-ε turbulence model with wall function treatment for near wall region was used to perform turbulent flow calculations. Relative merits of alternative flow solvers were investigated. The predictor-corrector-based Pressure Implicit Splitting of Operators (PISO) algorithm was found to be computationally economic for unsteady flow computations. Wing structure was modeled using Bernoulli-Euler beam theory. A fully implicit time-marching scheme (using the Newmark integration method) was used to integrate the equations of motion for structure. Bilinear interpolation and linear extrapolation techniques were used to transfer necessary information between fluid and structure solvers. Geometry deformation was accounted for by using a moving boundary module. The moving grid capability was based on a master/slave concept and transfinite interpolation techniques. Since computations were performed on a moving mesh system, the geometric conservation law must be preserved. This is achieved by appropriately evaluating the Jacobian values associated with each cell. Accurate computation of contravariant velocities for unsteady flows using the momentum interpolation method on collocated, curvilinear grids was also addressed. Flutter computations were performed for the AGARD 445.6 wing at subsonic, transonic and supersonic Mach numbers. Unsteady computations were performed at various dynamic pressures to predict the flutter boundary. Results showed favorable agreement of experiment and previous numerical results. The computational methodology exhibited capabilities to predict both qualitative and quantitative features of aeroelasticity.
Measuring Weld Profiles By Computer Tomography

NASA Technical Reports Server (NTRS)

Pascua, Antonio G.; Roy, Jagatjit

1990-01-01

Noncontacting, nondestructive computer tomography system determines internal and external contours of welded objects. System makes it unnecessary to take metallurgical sections (destructive technique) or to take silicone impressions of hidden surfaces (technique that contaminates) to inspect them. Measurements of contours via tomography performed 10 times as fast as measurements via impression molds, and tomography does not contaminate inspected parts.
Research in the design of high-performance reconfigurable systems

NASA Technical Reports Server (NTRS)

Mcewan, S. D.; Spry, A. J.

1985-01-01

Computer aided design and computer aided manufacturing have the potential for greatly reducing the cost and lead time in the development of VLSI components. This potential paves the way for the design and fabrication of a wide variety of economically feasible high level functional units. It was observed that current computer systems have only a limited capacity to absorb new VLSI component types other than memory, microprocessors, and a relatively small number of other parts. The first purpose is to explore a system design which is capable of effectively incorporating a considerable number of VLSI part types and will both increase the speed of computation and reduce the attendant programming effort. A second purpose is to explore design techniques for VLSI parts which when incorporated by such a system will result in speeds and costs which are optimal. The proposed work may lay the groundwork for future efforts in the extensive simulation and measurements of the system's cost effectiveness and lead to prototype development.
Knowledge-based vision for space station object motion detection, recognition, and tracking

NASA Technical Reports Server (NTRS)

Symosek, P.; Panda, D.; Yalamanchili, S.; Wehner, W., III

1987-01-01

Computer vision, especially color image analysis and understanding, has much to offer in the area of the automation of Space Station tasks such as construction, satellite servicing, rendezvous and proximity operations, inspection, experiment monitoring, data management and training. Knowledge-based techniques improve the performance of vision algorithms for unstructured environments because of their ability to deal with imprecise a priori information or inaccurately estimated feature data and still produce useful results. Conventional techniques using statistical and purely model-based approaches lack flexibility in dealing with the variabilities anticipated in the unstructured viewing environment of space. Algorithms developed under NASA sponsorship for Space Station applications to demonstrate the value of a hypothesized architecture for a Video Image Processor (VIP) are presented. Approaches to the enhancement of the performance of these algorithms with knowledge-based techniques and the potential for deployment of highly-parallel multi-processor systems for these algorithms are discussed.
Towards a large-scale scalable adaptive heart model using shallow tree meshes

NASA Astrophysics Data System (ADS)

Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf

2015-10-01

Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
Flexible body stability analysis of Space Shuttle ascent flight control system by using lambda matrix solution techniques

NASA Technical Reports Server (NTRS)

Bown, R. L.; Christofferson, A.; Lardas, M.; Flanders, H.

1980-01-01

A lambda matrix solution technique is being developed to perform an open loop frequency analysis of a high order dynamic system. The procedure evaluates the right and left latent vectors corresponding to the respective latent roots. The latent vectors are used to evaluate the partial fraction expansion formulation required to compute the flexible body open loop feedback gains for the Space Shuttle Digital Ascent Flight Control System. The algorithm is in the final stages of development and will be used to insure that the feedback gains meet the design specification.

Applications of Computational Methods for Dynamic Stability and Control Derivatives

NASA Technical Reports Server (NTRS)

Green, Lawrence L.; Spence, Angela M.

2004-01-01

Initial steps in the application o f a low-order panel method computational fluid dynamic (CFD) code to the calculation of aircraft dynamic stability and control (S&C) derivatives are documented. Several capabilities, unique to CFD but not unique to this particular demonstration, are identified and demonstrated in this paper. These unique capabilities complement conventional S&C techniques and they include the ability to: 1) perform maneuvers without the flow-kinematic restrictions and support interference commonly associated with experimental S&C facilities, 2) easily simulate advanced S&C testing techniques, 3) compute exact S&C derivatives with uncertainty propagation bounds, and 4) alter the flow physics associated with a particular testing technique from those observed in a wind or water tunnel test in order to isolate effects. Also presented are discussions about some computational issues associated with the simulation of S&C tests and selected results from numerous surface grid resolution studies performed during the course of the study.
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal

NASA Astrophysics Data System (ADS)

Satheeskumaran, S.; Sabrigiriraj, M.

2016-06-01

Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
MrGrid: A Portable Grid Based Molecular Replacement Pipeline

PubMed Central

Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.

2010-01-01

Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612
A lightweight network anomaly detection technique

DOE PAGES

Kim, Jinoh; Yoo, Wucherl; Sim, Alex; ...

2017-03-13

While the network anomaly detection is essential in network operations and management, it becomes further challenging to perform the first line of detection against the exponentially increasing volume of network traffic. In this paper, we develop a technique for the first line of online anomaly detection with two important considerations: (i) availability of traffic attributes during the monitoring time, and (ii) computational scalability for streaming data. The presented learning technique is lightweight and highly scalable with the beauty of approximation based on the grid partitioning of the given dimensional space. With the public traffic traces of KDD Cup 1999 andmore » NSL-KDD, we show that our technique yields 98.5% and 83% of detection accuracy, respectively, only with a couple of readily available traffic attributes that can be obtained without the help of post-processing. Finally, the results are at least comparable with the classical learning methods including decision tree and random forest, with approximately two orders of magnitude faster learning performance.« less
A lightweight network anomaly detection technique

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jinoh; Yoo, Wucherl; Sim, Alex

While the network anomaly detection is essential in network operations and management, it becomes further challenging to perform the first line of detection against the exponentially increasing volume of network traffic. In this paper, we develop a technique for the first line of online anomaly detection with two important considerations: (i) availability of traffic attributes during the monitoring time, and (ii) computational scalability for streaming data. The presented learning technique is lightweight and highly scalable with the beauty of approximation based on the grid partitioning of the given dimensional space. With the public traffic traces of KDD Cup 1999 andmore » NSL-KDD, we show that our technique yields 98.5% and 83% of detection accuracy, respectively, only with a couple of readily available traffic attributes that can be obtained without the help of post-processing. Finally, the results are at least comparable with the classical learning methods including decision tree and random forest, with approximately two orders of magnitude faster learning performance.« less
Hybrid approach combining multiple characterization techniques and simulations for microstructural analysis of proton exchange membrane fuel cell electrodes

NASA Astrophysics Data System (ADS)

Cetinbas, Firat C.; Ahluwalia, Rajesh K.; Kariuki, Nancy; De Andrade, Vincent; Fongalland, Dash; Smith, Linda; Sharman, Jonathan; Ferreira, Paulo; Rasouli, Somaye; Myers, Deborah J.

2017-03-01

The cost and performance of proton exchange membrane fuel cells strongly depend on the cathode electrode due to usage of expensive platinum (Pt) group metal catalyst and sluggish reaction kinetics. Development of low Pt content high performance cathodes requires comprehensive understanding of the electrode microstructure. In this study, a new approach is presented to characterize the detailed cathode electrode microstructure from nm to μm length scales by combining information from different experimental techniques. In this context, nano-scale X-ray computed tomography (nano-CT) is performed to extract the secondary pore space of the electrode. Transmission electron microscopy (TEM) is employed to determine primary C particle and Pt particle size distributions. X-ray scattering, with its ability to provide size distributions of orders of magnitude more particles than TEM, is used to confirm the TEM-determined size distributions. The number of primary pores that cannot be resolved by nano-CT is approximated using mercury intrusion porosimetry. An algorithm is developed to incorporate all these experimental data in one geometric representation. Upon validation of pore size distribution against gas adsorption and mercury intrusion porosimetry data, reconstructed ionomer size distribution is reported. In addition, transport related characteristics and effective properties are computed by performing simulations on the hybrid microstructure.
Region Templates: Data Representation and Management for High-Throughput Image Analysis

PubMed Central

Pan, Tony; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Klasky, Scott; Saltz, Joel

2015-01-01

We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13×. Finally, a processing rate of 11,730 4K×4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1,200 CPU cores). This computation rate enables studies with very large datasets. PMID:26139953
Automated analysis and classification of melanocytic tumor on skin whole slide images.

PubMed

Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal

2018-06-01

This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Next-generation acceleration and code optimization for light transport in turbid media using GPUs

PubMed Central

Alerstam, Erik; Lo, William Chun Yip; Han, Tianyi David; Rose, Jonathan; Andersson-Engels, Stefan; Lilge, Lothar

2010-01-01

A highly optimized Monte Carlo (MC) code package for simulating light transport is developed on the latest graphics processing unit (GPU) built for general-purpose computing from NVIDIA - the Fermi GPU. In biomedical optics, the MC method is the gold standard approach for simulating light transport in biological tissue, both due to its accuracy and its flexibility in modelling realistic, heterogeneous tissue geometry in 3-D. However, the widespread use of MC simulations in inverse problems, such as treatment planning for PDT, is limited by their long computation time. Despite its parallel nature, optimizing MC code on the GPU has been shown to be a challenge, particularly when the sharing of simulation result matrices among many parallel threads demands the frequent use of atomic instructions to access the slow GPU global memory. This paper proposes an optimization scheme that utilizes the fast shared memory to resolve the performance bottleneck caused by atomic access, and discusses numerous other optimization techniques needed to harness the full potential of the GPU. Using these techniques, a widely accepted MC code package in biophotonics, called MCML, was successfully accelerated on a Fermi GPU by approximately 600x compared to a state-of-the-art Intel Core i7 CPU. A skin model consisting of 7 layers was used as the standard simulation geometry. To demonstrate the possibility of GPU cluster computing, the same GPU code was executed on four GPUs, showing a linear improvement in performance with an increasing number of GPUs. The GPU-based MCML code package, named GPU-MCML, is compatible with a wide range of graphics cards and is released as an open-source software in two versions: an optimized version tuned for high performance and a simplified version for beginners (http://code.google.com/p/gpumcml). PMID:21258498
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations

PubMed Central

Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.

2016-01-01

Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.

PubMed

Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W

2016-01-01

Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
Scheduling algorithms for automatic control systems for technological processes

NASA Astrophysics Data System (ADS)

Chernigovskiy, A. S.; Tsarev, R. Yu; Kapulin, D. V.

2017-01-01

Wide use of automatic process control systems and the usage of high-performance systems containing a number of computers (processors) give opportunities for creation of high-quality and fast production that increases competitiveness of an enterprise. Exact and fast calculations, control computation, and processing of the big data arrays - all of this requires the high level of productivity and, at the same time, minimum time of data handling and result receiving. In order to reach the best time, it is necessary not only to use computing resources optimally, but also to design and develop the software so that time gain will be maximal. For this purpose task (jobs or operations), scheduling techniques for the multi-machine/multiprocessor systems are applied. Some of basic task scheduling methods for the multi-machine process control systems are considered in this paper, their advantages and disadvantages come to light, and also some usage considerations, in case of the software for automatic process control systems developing, are made.
Computer assisted analysis of auroral images obtained from high altitude polar satellites

NASA Technical Reports Server (NTRS)

Samadani, Ramin; Flynn, Michael

1993-01-01

Automatic techniques that allow the extraction of physically significant parameters from auroral images were developed. This allows the processing of a much larger number of images than is currently possible with manual techniques. Our techniques were applied to diverse auroral image datasets. These results were made available to geophysicists at NASA and at universities in the form of a software system that performs the analysis. After some feedback from users, an upgraded system was transferred to NASA and to two universities. The feasibility of user-trained search and retrieval of large amounts of data using our automatically derived parameter indices was demonstrated. Techniques based on classification and regression trees (CART) were developed and applied to broaden the types of images to which the automated search and retrieval may be applied. Our techniques were tested with DE-1 auroral images.
Parallel, adaptive finite element methods for conservation laws

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.

1994-01-01

We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
The fractional Fourier transform and applications

NASA Technical Reports Server (NTRS)

Bailey, David H.; Swarztrauber, Paul N.

1991-01-01

This paper describes the 'fractional Fourier transform', which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e exp -2(pi)i/n, the fractional Fourier transform is based on fractional roots of unity e exp -2(pi)i(alpha), where alpha is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with noninteger periodicities, performing high-resolution trigonometric interpolation, detecting lines in noisy images, and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques.
Design Strategy for a Formally Verified Reliable Computing Platform

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Caldwell, James L.; DiVito, Ben L.

1991-01-01

This paper presents a high-level design for a reliable computing platform for real-time control applications. The design tradeoffs and analyses related to the development of a formally verified reliable computing platform are discussed. The design strategy advocated in this paper requires the use of techniques that can be completely characterized mathematically as opposed to more powerful or more flexible algorithms whose performance properties can only be analyzed by simulation and testing. The need for accurate reliability models that can be related to the behavior models is also stressed. Tradeoffs between reliability and voting complexity are explored. In particular, the transient recovery properties of the system are found to be fundamental to both the reliability analysis as well as the "correctness" models.
Management and Analysis of Biological and Clinical Data: How Computer Science May Support Biomedical and Clinical Research

NASA Astrophysics Data System (ADS)

Veltri, Pierangelo

The use of computer based solutions for data management in biology and clinical science has contributed to improve life-quality and also to gather research results in shorter time. Indeed, new algorithms and high performance computation have been using in proteomics and genomics studies for curing chronic diseases (e.g., drug designing) as well as supporting clinicians both in diagnosis (e.g., images-based diagnosis) and patient curing (e.g., computer based information analysis on information gathered from patient). In this paper we survey on examples of computer based techniques applied in both biology and clinical contexts. The reported applications are also results of experiences in real case applications at University Medical School of Catanzaro and also part of experiences of the National project Staywell SH 2.0 involving many research centers and companies aiming to study and improve citizen wellness.
High performance GPU processing for inversion using uniform grid searches

NASA Astrophysics Data System (ADS)

Venetis, Ioannis E.; Saltogianni, Vasso; Stiros, Stathis; Gallopoulos, Efstratios

2017-04-01

Many geophysical problems are described by systems of redundant, highly non-linear systems of ordinary equations with constant terms deriving from measurements and hence representing stochastic variables. Solution (inversion) of such problems is based on numerical, optimization methods, based on Monte Carlo sampling or on exhaustive searches in cases of two or even three "free" unknown variables. Recently the TOPological INVersion (TOPINV) algorithm, a grid search-based technique in the Rn space, has been proposed. TOPINV is not based on the minimization of a certain cost function and involves only forward computations, hence avoiding computational errors. The basic concept is to transform observation equations into inequalities on the basis of an optimization parameter k and of their standard errors, and through repeated "scans" of n-dimensional search grids for decreasing values of k to identify the optimal clusters of gridpoints which satisfy observation inequalities and by definition contain the "true" solution. Stochastic optimal solutions and their variance-covariance matrices are then computed as first and second statistical moments. Such exhaustive uniform searches produce an excessive computational load and are extremely time consuming for common computers based on a CPU. An alternative is to use a computing platform based on a GPU, which nowadays is affordable to the research community, which provides a much higher computing performance. Using the CUDA programming language to implement TOPINV allows the investigation of the attained speedup in execution time on such a high performance platform. Based on synthetic data we compared the execution time required for two typical geophysical problems, modeling magma sources and seismic faults, described with up to 18 unknown variables, on both CPU/FORTRAN and GPU/CUDA platforms. The same problems for several different sizes of search grids (up to 1012 gridpoints) and numbers of unknown variables were solved on both platforms, and execution time as a function of the grid dimension for each problem was recorded. Results indicate an average speedup in calculations by a factor of 100 on the GPU platform; for example problems with 1012 grid-points require less than two hours instead of several days on conventional desktop computers. Such a speedup encourages the application of TOPINV on high performance platforms, as a GPU, in cases where nearly real time decisions are necessary, for example finite fault modeling to identify possible tsunami sources.
Monitoring of facial stress during space flight: Optical computer recognition combining discriminative and generative methods

NASA Astrophysics Data System (ADS)

Dinges, David F.; Venkataraman, Sundara; McGlinchey, Eleanor L.; Metaxas, Dimitris N.

2007-02-01

Astronauts are required to perform mission-critical tasks at a high level of functional capability throughout spaceflight. Stressors can compromise their ability to do so, making early objective detection of neurobehavioral problems in spaceflight a priority. Computer optical approaches offer a completely unobtrusive way to detect distress during critical operations in space flight. A methodology was developed and a study completed to determine whether optical computer recognition algorithms could be used to discriminate facial expressions during stress induced by performance demands. Stress recognition from a facial image sequence is a subject that has not received much attention although it is an important problem for many applications beyond space flight (security, human-computer interaction, etc.). This paper proposes a comprehensive method to detect stress from facial image sequences by using a model-based tracker. The image sequences were captured as subjects underwent a battery of psychological tests under high- and low-stress conditions. A cue integration-based tracking system accurately captured the rigid and non-rigid parameters of different parts of the face (eyebrows, lips). The labeled sequences were used to train the recognition system, which consisted of generative (hidden Markov model) and discriminative (support vector machine) parts that yield results superior to using either approach individually. The current optical algorithm methods performed at a 68% accuracy rate in an experimental study of 60 healthy adults undergoing periods of high-stress versus low-stress performance demands. Accuracy and practical feasibility of the technique is being improved further with automatic multi-resolution selection for the discretization of the mask, and automated face detection and mask initialization algorithms.
Intelligence in Scientific Computing.

DTIC Science & Technology

1993-12-31

simulation) a high-performance controller for a magnetic levitation system - the German Transrapid system. The new control system can stabilize maglev ...techniques. A paper by Feng Zhao and Richard Thornton about the maglev controller designed by his program was presented at the 31st IEEE conference on...Massachusetts Insti- tute of Technology, 1991. Also availible as MIT AITR 1385. Zhao, F. and Thornton, R. "Automatic Design of a Maglev Controller in

Integrated structure/control design - Present methodology and future opportunities

NASA Technical Reports Server (NTRS)

Weisshaar, T. A.; Newsom, J. R.; Zeiler, T. A.; Gilbert, M. G.

1986-01-01

Attention is given to current methodology applied to the integration of the optimal design process for structures and controls. Multilevel linear decomposition techniques proved to be most effective in organizing the computational efforts necessary for ISCD (integrated structures and control design) tasks. With the development of large orbiting space structures and actively controlled, high performance aircraft, there will be more situations in which this concept can be applied.
Compressive Sensing with Cross-Validation and Stop-Sampling for Sparse Polynomial Chaos Expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huan, Xun; Safta, Cosmin; Sargsyan, Khachik

Compressive sensing is a powerful technique for recovering sparse solutions of underdetermined linear systems, which is often encountered in uncertainty quanti cation analysis of expensive and high-dimensional physical models. We perform numerical investigations employing several com- pressive sensing solvers that target the unconstrained LASSO formulation, with a focus on linear systems that arise in the construction of polynomial chaos expansions. With core solvers of l1 ls, SpaRSA, CGIST, FPC AS, and ADMM, we develop techniques to mitigate over tting through an automated selection of regularization constant based on cross-validation, and a heuristic strategy to guide the stop-sampling decision. Practical recommendationsmore » on parameter settings for these tech- niques are provided and discussed. The overall method is applied to a series of numerical examples of increasing complexity, including large eddy simulations of supersonic turbulent jet-in-cross flow involving a 24-dimensional input. Through empirical phase-transition diagrams and convergence plots, we illustrate sparse recovery performance under structures induced by polynomial chaos, accuracy and computational tradeoffs between polynomial bases of different degrees, and practi- cability of conducting compressive sensing for a realistic, high-dimensional physical application. Across test cases studied in this paper, we find ADMM to have demonstrated empirical advantages through consistent lower errors and faster computational times.« less
System-Level Virtualization for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vallee, Geoffroy R; Naughton, III, Thomas J; Engelmann, Christian

2008-01-01

System-level virtualization has been a research topic since the 70's but regained popularity during the past few years because of the availability of efficient solution such as Xen and the implementation of hardware support in commodity processors (e.g. Intel-VT, AMD-V). However, a majority of system-level virtualization projects is guided by the server consolidation market. As a result, current virtualization solutions appear to not be suitable for high performance computing (HPC) which is typically based on large-scale systems. On another hand there is significant interest in exploiting virtual machines (VMs) within HPC for a number of other reasons. By virtualizing themore » machine, one is able to run a variety of operating systems and environments as needed by the applications. Virtualization allows users to isolate workloads, improving security and reliability. It is also possible to support non-native environments and/or legacy operating environments through virtualization. In addition, it is possible to balance work loads, use migration techniques to relocate applications from failing machines, and isolate fault systems for repair. This document presents the challenges for the implementation of a system-level virtualization solution for HPC. It also presents a brief survey of the different approaches and techniques to address these challenges.« less
Intricacies of modern supercomputing illustrated with recent advances in simulations of strongly correlated electron systems

NASA Astrophysics Data System (ADS)

Schulthess, Thomas C.

2013-03-01

The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification

PubMed Central

Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

2015-01-01

Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043
Scalable Prediction of Energy Consumption using Incremental Time Series Clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simmhan, Yogesh; Noor, Muhammad Usman

2013-10-09

Time series datasets are a canonical form of high velocity Big Data, and often generated by pervasive sensors, such as found in smart infrastructure. Performing predictive analytics on time series data can be computationally complex, and requires approximation techniques. In this paper, we motivate this problem using a real application from the smart grid domain. We propose an incremental clustering technique, along with a novel affinity score for determining cluster similarity, which help reduce the prediction error for cumulative time series within a cluster. We evaluate this technique, along with optimizations, using real datasets from smart meters, totaling ~700,000 datamore » points, and show the efficacy of our techniques in improving the prediction error of time series data within polynomial time.« less
Numerical simulation of coupled electrochemical and transport processes in battery systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liaw, B.Y.; Gu, W.B.; Wang, C.Y.

1997-12-31

Advanced numerical modeling to simulate dynamic battery performance characteristics for several types of advanced batteries is being conducted using computational fluid dynamics (CFD) techniques. The CFD techniques provide efficient algorithms to solve a large set of highly nonlinear partial differential equations that represent the complex battery behavior governed by coupled electrochemical reactions and transport processes. The authors have recently successfully applied such techniques to model advanced lead-acid, Ni-Cd and Ni-MH cells. In this paper, the authors briefly discuss how the governing equations were numerically implemented, show some preliminary modeling results, and compare them with other modeling or experimental data reportedmore » in the literature. The authors describe the advantages and implications of using the CFD techniques and their capabilities in future battery applications.« less
Analysis of optical route in a micro high-speed magneto-optic switch

NASA Astrophysics Data System (ADS)

Weng, Zihua; Yang, Guoguang; Huang, Yuanqing; Chen, Zhimin; Zhu, Yun; Wu, Jinming; Lin, Shufen; Mo, Weiping

2005-02-01

A novel micro high-speed 2x2 magneto-optic switch and its optical route, which is used in high-speed all-optical communication network, is designed and analyzed in this paper. The study of micro high-speed magneto-optic switch mainly involves the optical route and high-speed control technique design. The optical route design covers optical route design of polarization in optical switch, the performance analysis and material selection of magneto-optic crystal and magnetic path design in Faraday rotator. The research of high-speed control technique involves the study of nanosecond pulse generator, high-speed magnetic field and its control technique etc. High-speed current transients from nanosecond pulse generator are used to switch the magnetization of the magneto-optic crystal, which propagates a 1550nm optical beam. The optical route design schemes and electronic circuits of high-speed control technique are both simulated on computer and test by the experiments respectively. The experiment results state that the nanosecond pulse generator can output the pulse with rising edge time 3~35ns, voltage amplitude 10~90V and pulse width 10~100ns. Under the control of CPU singlechip, the optical beam can be stably switched and the switching time is less than 1μs currently.
Radiograph and passive data analysis using mixed variable optimization

DOEpatents

Temple, Brian A.; Armstrong, Jerawan C.; Buescher, Kevin L.; Favorite, Jeffrey A.

2015-06-02

Disclosed herein are representative embodiments of methods, apparatus, and systems for performing radiography analysis. For example, certain embodiments perform radiographic analysis using mixed variable computation techniques. One exemplary system comprises a radiation source, a two-dimensional detector for detecting radiation transmitted through a object between the radiation source and detector, and a computer. In this embodiment, the computer is configured to input the radiographic image data from the two-dimensional detector and to determine one or more materials that form the object by using an iterative analysis technique that selects the one or more materials from hierarchically arranged solution spaces of discrete material possibilities and selects the layer interfaces from the optimization of the continuous interface data.
Comparison of dual and single exposure techniques in dual-energy chest radiography.

PubMed

Ho, J T; Kruger, R A; Sorenson, J A

1989-01-01

Conventional chest radiography is the most effective tool for lung cancer detection and diagnosis; nevertheless, a high percentage of lung cancer tumors are missed because of the overlap of lung nodule image contrast with bone image contrast in a chest radiograph. Two different energy subtraction strategies, dual exposure and single exposure techniques, were studied for decomposing a radiograph into bone-free and soft tissue-free images to address this problem. For comparing the efficiency of these two techniques in lung nodule detection, the performances of the techniques were evaluated on the basis of residual tissue contrast, energy separation, and signal-to-noise ratio. The evaluation was based on both computer simulation and experimental verification. The dual exposure technique was found to be better than the single exposure technique because of its higher signal-to-noise ratio and greater residual tissue contrast. However, x-ray tube loading and patient motion are problems.
Compressive sensing scalp EEG signals: implementations and practical performance.

PubMed

Abdulghani, Amir M; Casson, Alexander J; Rodriguez-Villegas, Esther

2012-11-01

Highly miniaturised, wearable computing and communication systems allow unobtrusive, convenient and long term monitoring of a range of physiological parameters. For long term operation from the physically smallest batteries, the average power consumption of a wearable device must be very low. It is well known that the overall power consumption of these devices can be reduced by the inclusion of low power consumption, real-time compression of the raw physiological data in the wearable device itself. Compressive sensing is a new paradigm for providing data compression: it has shown significant promise in fields such as MRI; and is potentially suitable for use in wearable computing systems as the compression process required in the wearable device has a low computational complexity. However, the practical performance very much depends on the characteristics of the signal being sensed. As such the utility of the technique cannot be extrapolated from one application to another. Long term electroencephalography (EEG) is a fundamental tool for the investigation of neurological disorders and is increasingly used in many non-medical applications, such as brain-computer interfaces. This article investigates in detail the practical performance of different implementations of the compressive sensing theory when applied to scalp EEG signals.
High-Performance Computing Data Center Warm-Water Liquid Cooling |

Science.gov Websites

Computational Science | NREL Warm-Water Liquid Cooling High-Performance Computing Data Center Warm-Water Liquid Cooling NREL's High-Performance Computing Data Center (HPC Data Center) is liquid water Liquid cooling technologies offer a more energy-efficient solution that also allows for effective
The Effects of Computer Animated Dissection versus Preserved Animal Dissection on the Student Achievement in a High School Biology Class.

ERIC Educational Resources Information Center

Kariuki, Patrick; Paulson, Ronda

The purpose of this study was to examine the effectiveness of computer-animated dissection techniques versus the effectiveness of traditional dissection techniques as related to student achievement. The sample used was 104 general biology students from a small, rural high school in Northeast Tennessee. Random selection was used to separate the…
FPGA-based protein sequence alignment : A review

NASA Astrophysics Data System (ADS)

Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

2017-11-01

Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.
Degenerate operators and the 1/c expansion: Lorentzian resummations, high order computations, and super-Virasoro blocks

DOE PAGES

Chen, Hongbin; Fitzpatrick, A. Liam; Kaplan, Jared; ...

2017-03-30

Here, one can obtain exact information about Virasoro conformal blocks by analytically continuing the correlators of degenerate operators. We argued in recent work that this technique can be used to explicitly resolve information loss problems in AdS 3/CFT 2. In this paper we use the technique to perform calculations in the small 1/c ∝ GN expansion: (1) we prove the all-orders resummation of logarithmic factors ∝1/clog z in the Lorentzian regime, demonstrating that 1/c corrections directly shift Lyapunov exponents associated with chaos, as claimed in prior work, (2) we perform another all-orders resummation in the limit of large c withmore » fixed cz, interpolating between the early onset of chaos and late time behavior, (3) we explicitly compute the Virasoro vacuum block to order 1/c 2 and 1/c 3 with external dimensions fixed, corresponding to 2 and 3 loop calculations in AdS 3, and (4) we derive the heavy-light vacuum blocks in theories with N = 1, 2 superconformal symmetry.« less
Degenerate operators and the 1/c expansion: Lorentzian resummations, high order computations, and super-Virasoro blocks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hongbin; Fitzpatrick, A. Liam; Kaplan, Jared

Here, one can obtain exact information about Virasoro conformal blocks by analytically continuing the correlators of degenerate operators. We argued in recent work that this technique can be used to explicitly resolve information loss problems in AdS 3/CFT 2. In this paper we use the technique to perform calculations in the small 1/c ∝ GN expansion: (1) we prove the all-orders resummation of logarithmic factors ∝1/clog z in the Lorentzian regime, demonstrating that 1/c corrections directly shift Lyapunov exponents associated with chaos, as claimed in prior work, (2) we perform another all-orders resummation in the limit of large c withmore » fixed cz, interpolating between the early onset of chaos and late time behavior, (3) we explicitly compute the Virasoro vacuum block to order 1/c 2 and 1/c 3 with external dimensions fixed, corresponding to 2 and 3 loop calculations in AdS 3, and (4) we derive the heavy-light vacuum blocks in theories with N = 1, 2 superconformal symmetry.« less
Wave propagation in strain gradient poroelastic medium with microinertia: closed-form and finite element solutions

NASA Astrophysics Data System (ADS)

Rosi, Giuseppe; Scala, Ilaria; Nguyen, Vu-Hieu; Naili, Salah

2017-06-01

This article is about ultrasonic wave propagation in microstructured porous media. The classic Biot's model is enriched using a strain gradient approach to be able to capture high-order effects when the wavelength approaches the characteristic size of the microstructure. In order to reproduce actual transmission/reflection experiments performed on poroelastic samples, and to validate the choice of the model, the computation of the time domain response is necessary, as it allows for a direct comparison with experimental results. For obtaining the time response, we use two strategies: on the one hand we compute the closed form solution by using the Laplace and Fourier transforms techniques; on the other hand we used a finite element method. The results are presented for a transmission/reflection test performed on a poroelastic sample immersed in water. The effects introduced by the strain gradient terms are visible in the time response and in agreement with experimental observations. The results can be exploited in characterization of mechanical properties of poroelastic media by enhancing the reliability of quantitative ultrasound techniques.
Avionic Architecture for Model Predictive Control Application in Mars Sample & Return Rendezvous Scenario

NASA Astrophysics Data System (ADS)

Saponara, M.; Tramutola, A.; Creten, P.; Hardy, J.; Philippe, C.

2013-08-01

Optimization-based control techniques such as Model Predictive Control (MPC) are considered extremely attractive for space rendezvous, proximity operations and capture applications that require high level of autonomy, optimal path planning and dynamic safety margins. Such control techniques require high-performance computational needs for solving large optimization problems. The development and implementation in a flight representative avionic architecture of a MPC based Guidance, Navigation and Control system has been investigated in the ESA R&T study “On-line Reconfiguration Control System and Avionics Architecture” (ORCSAT) of the Aurora programme. The paper presents the baseline HW and SW avionic architectures, and verification test results obtained with a customised RASTA spacecraft avionics development platform from Aeroflex Gaisler.
Imaging in anatomy: a comparison of imaging techniques in embalmed human cadavers

PubMed Central

2013-01-01

Background A large variety of imaging techniques is an integral part of modern medicine. Introducing radiological imaging techniques into the dissection course serves as a basis for improved learning of anatomy and multidisciplinary learning in pre-clinical medical education. Methods Four different imaging techniques (ultrasound, radiography, computed tomography, and magnetic resonance imaging) were performed in embalmed human body donors to analyse possibilities and limitations of the respective techniques in this peculiar setting. Results The quality of ultrasound and radiography images was poor, images of computed tomography and magnetic resonance imaging were of good quality. Conclusion Computed tomography and magnetic resonance imaging have a superior image quality in comparison to ultrasound and radiography and offer suitable methods for imaging embalmed human cadavers as a valuable addition to the dissection course. PMID:24156510
QoS support for end users of I/O-intensive applications using shared storage systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, Marion Kei; Zhang, Xuechen; Jiang, Song

2011-01-19

I/O-intensive applications are becoming increasingly common on today's high-performance computing systems. While performance of compute-bound applications can be effectively guaranteed with techniques such as space sharing or QoS-aware process scheduling, it remains a challenge to meet QoS requirements for end users of I/O-intensive applications using shared storage systems because it is difficult to differentiate I/O services for different applications with individual quality requirements. Furthermore, it is difficult for end users to accurately specify performance goals to the storage system using I/O-related metrics such as request latency or throughput. As access patterns, request rates, and the system workload change in time,more » a fixed I/O performance goal, such as bounds on throughput or latency, can be expensive to achieve and may not lead to a meaningful performance guarantees such as bounded program execution time. We propose a scheme supporting end-users QoS goals, specified in terms of program execution time, in shared storage environments. We automatically translate the users performance goals into instantaneous I/O throughput bounds using a machine learning technique, and use dynamically determined service time windows to efficiently meet the throughput bounds. We have implemented this scheme in the PVFS2 parallel file system and have conducted an extensive evaluation. Our results show that this scheme can satisfy realistic end-user QoS requirements by making highly efficient use of the I/O resources. The scheme seeks to balance programs attainment of QoS requirements, and saves as much of the remaining I/O capacity as possible for best-effort programs.« less

Research Activity in Computational Physics utilizing High Performance Computing: Co-authorship Network Analysis

NASA Astrophysics Data System (ADS)

Ahn, Sul-Ah; Jung, Youngim

2016-10-01

The research activities of the computational physicists utilizing high performance computing are analyzed by bibliometirc approaches. This study aims at providing the computational physicists utilizing high-performance computing and policy planners with useful bibliometric results for an assessment of research activities. In order to achieve this purpose, we carried out a co-authorship network analysis of journal articles to assess the research activities of researchers for high-performance computational physics as a case study. For this study, we used journal articles of the Scopus database from Elsevier covering the time period of 2004-2013. We extracted the author rank in the physics field utilizing high-performance computing by the number of papers published during ten years from 2004. Finally, we drew the co-authorship network for 45 top-authors and their coauthors, and described some features of the co-authorship network in relation to the author rank. Suggestions for further studies are discussed.
Performance evaluation using SYSTID time domain simulation. [computer-aid design and analysis for communication systems

NASA Technical Reports Server (NTRS)

Tranter, W. H.; Ziemer, R. E.; Fashano, M. J.

1975-01-01

This paper reviews the SYSTID technique for performance evaluation of communication systems using time-domain computer simulation. An example program illustrates the language. The inclusion of both Gaussian and impulse noise models make accurate simulation possible in a wide variety of environments. A very flexible postprocessor makes possible accurate and efficient performance evaluation.
Spatial-spectral preprocessing for endmember extraction on GPU's

NASA Astrophysics Data System (ADS)

Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

2016-10-01

Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based on our experiments with hyperspectral data sets, high computational performance is observed in both cases.
A New Computational Technique for the Generation of Optimised Aircraft Trajectories

NASA Astrophysics Data System (ADS)

Chircop, Kenneth; Gardi, Alessandro; Zammit-Mangion, David; Sabatini, Roberto

2017-12-01

A new computational technique based on Pseudospectral Discretisation (PSD) and adaptive bisection ɛ-constraint methods is proposed to solve multi-objective aircraft trajectory optimisation problems formulated as nonlinear optimal control problems. This technique is applicable to a variety of next-generation avionics and Air Traffic Management (ATM) Decision Support Systems (DSS) for strategic and tactical replanning operations. These include the future Flight Management Systems (FMS) and the 4-Dimensional Trajectory (4DT) planning and intent negotiation/validation tools envisaged by SESAR and NextGen for a global implementation. In particular, after describing the PSD method, the adaptive bisection ɛ-constraint method is presented to allow an efficient solution of problems in which two or multiple performance indices are to be minimized simultaneously. Initial simulation case studies were performed adopting suitable aircraft dynamics models and addressing a classical vertical trajectory optimisation problem with two objectives simultaneously. Subsequently, a more advanced 4DT simulation case study is presented with a focus on representative ATM optimisation objectives in the Terminal Manoeuvring Area (TMA). The simulation results are analysed in-depth and corroborated by flight performance analysis, supporting the validity of the proposed computational techniques.
Effective doses to patients undergoing thoracic computed tomography examinations.

PubMed

Huda, W; Scalzetti, E M; Roskopf, M

2000-05-01

The purpose of this study was to investigate how x-ray technique factors and effective doses vary with patient size in chest CT examinations. Technique factors (kVp, mAs, section thickness, and number of sections) were recorded for 44 patients who underwent a routine chest CT examination. Patient weights were recorded together with dimensions and mean Hounsfield unit values obtained from representative axial CT images. The total mass of directly irradiated patient was modeled as a cylinder of water to permit the computation of the mean patient dose and total energy imparted for each chest CT examination. Computed values of energy imparted during the chest CT examination were converted into effective doses taking into account the patient weight. Patient weights ranged from 4.5 to 127 kg, and half the patients in this study were children under 18 years of age. All scans were performed at 120 kVp with a 1 s scan time. The selected tube current showed no correlation with patient weight (r2=0.06), indicating that chest CT examination protocols do not take into account for the size of the patient. Energy imparted increased with increasing patient weight, with values of energy imparted for 10 and 70 kg patients being 85 and 310 mJ, respectively. The effective dose showed an inverse correlation with increasing patient weight, however, with values of effective dose for 10 and 70 kg patients being 9.6 and 5.4 mSv, respectively. Current CT technique factors (kVp/mAs) used to perform chest CT examinations result in relatively high patient doses, which could be reduced by adjusting technique factors based on patient size.
Numerical Propulsion System Simulation: An Overview

NASA Technical Reports Server (NTRS)

Lytle, John K.

2000-01-01

The cost of implementing new technology in aerospace propulsion systems is becoming prohibitively expensive and time consuming. One of the main contributors to the high cost and lengthy time is the need to perform many large-scale hardware tests and the inability to integrate all appropriate subsystems early in the design process. The NASA Glenn Research Center is developing the technologies required to enable simulations of full aerospace propulsion systems in sufficient detail to resolve critical design issues early in the design process before hardware is built. This concept, called the Numerical Propulsion System Simulation (NPSS), is focused on the integration of multiple disciplines such as aerodynamics, structures and heat transfer with computing and communication technologies to capture complex physical processes in a timely and cost-effective manner. The vision for NPSS, as illustrated, is to be a "numerical test cell" that enables full engine simulation overnight on cost-effective computing platforms. There are several key elements within NPSS that are required to achieve this capability: 1) clear data interfaces through the development and/or use of data exchange standards, 2) modular and flexible program construction through the use of object-oriented programming, 3) integrated multiple fidelity analysis (zooming) techniques that capture the appropriate physics at the appropriate fidelity for the engine systems, 4) multidisciplinary coupling techniques and finally 5) high performance parallel and distributed computing. The current state of development in these five area focuses on air breathing gas turbine engines and is reported in this paper. However, many of the technologies are generic and can be readily applied to rocket based systems and combined cycles currently being considered for low-cost access-to-space applications. Recent accomplishments include: (1) the development of an industry-standard engine cycle analysis program and plug 'n play architecture, called NPSS Version 1, (2) A full engine simulation that combines a 3D low-pressure subsystem with a 0D high pressure core simulation. This demonstrates the ability to integrate analyses at different levels of detail and to aerodynamically couple components, the fan/booster and low-pressure turbine, through a 3D computational fluid dynamics simulation. (3) Simulation of all of the turbomachinery in a modern turbofan engine on parallel computing platform for rapid and cost-effective execution. This capability can also be used to generate full compressor map, requiring both design and off-design simulation. (4) Three levels of coupling characterize the multidisciplinary analysis under NPSS: loosely coupled, process coupled and tightly coupled. The loosely coupled and process coupled approaches require a common geometry definition to link CAD to analysis tools. The tightly coupled approach is currently validating the use of arbitrary Lagrangian/Eulerian formulation for rotating turbomachinery. The validation includes both centrifugal and axial compression systems. The results of the validation will be reported in the paper. (5) The demonstration of significant computing cost/performance reduction for turbine engine applications using PC clusters. The NPSS Project is supported under the NASA High Performance Computing and Communications Program.
SEMICONDUCTOR INTEGRATED CIRCUITS: A quasi-3-dimensional simulation method for a high-voltage level-shifting circuit structure

NASA Astrophysics Data System (ADS)

Jizhi, Liu; Xingbi, Chen

2009-12-01

A new quasi-three-dimensional (quasi-3D) numeric simulation method for a high-voltage level-shifting circuit structure is proposed. The performances of the 3D structure are analyzed by combining some 2D device structures; the 2D devices are in two planes perpendicular to each other and to the surface of the semiconductor. In comparison with Davinci, the full 3D device simulation tool, the quasi-3D simulation method can give results for the potential and current distribution of the 3D high-voltage level-shifting circuit structure with appropriate accuracy and the total CPU time for simulation is significantly reduced. The quasi-3D simulation technique can be used in many cases with advantages such as saving computing time, making no demands on the high-end computer terminals, and being easy to operate.
Importance of balanced architectures in the design of high-performance imaging systems

NASA Astrophysics Data System (ADS)

Sgro, Joseph A.; Stanton, Paul C.

1999-03-01

Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System

NASA Astrophysics Data System (ADS)

Xavier, M. P.; do Nascimento, T. M.; dos Santos, R. W.; Lobosco, M.

2014-03-01

The development of computational systems that mimics the physiological response of organs or even the entire body is a complex task. One of the issues that makes this task extremely complex is the huge computational resources needed to execute the simulations. For this reason, the use of parallel computing is mandatory. In this work, we focus on the simulation of temporal and spatial behaviour of some human innate immune system cells and molecules in a small three-dimensional section of a tissue. To perform this simulation, we use multiple Graphics Processing Units (GPUs) in a shared-memory environment. Despite of high initialization and communication costs imposed by the use of GPUs, the techniques used to implement the HIS simulator have shown to be very effective to achieve this purpose.
Algorithms and architecture for multiprocessor based circuit simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deutsch, J.T.

Accurate electrical simulation is critical to the design of high performance integrated circuits. Logic simulators can verify function and give first-order timing information. Switch level simulators are more effective at dealing with charge sharing than standard logic simulators, but cannot provide accurate timing information or discover DC problems. Delay estimation techniques and cell level simulation can be used in constrained design methods, but must be tuned for each application, and circuit simulation must still be used to generate the cell models. None of these methods has the guaranteed accuracy that many circuit designers desire, and none can provide detailed waveformmore » information. Detailed electrical-level simulation can predict circuit performance if devices and parasitics are modeled accurately. However, the computational requirements of conventional circuit simulators make it impractical to simulate current large circuits. In this dissertation, the implementation of Iterated Timing Analysis (ITA), a relaxation-based technique for accurate circuit simulation, on a special-purpose multiprocessor is presented. The ITA method is an SOR-Newton, relaxation-based method which uses event-driven analysis and selective trace to exploit the temporal sparsity of the electrical network. Because event-driven selective trace techniques are employed, this algorithm lends itself to implementation on a data-driven computer.« less
Fast global image smoothing based on weighted least squares.

PubMed

Min, Dongbo; Choi, Sunghwan; Lu, Jiangbo; Ham, Bumsub; Sohn, Kwanghoon; Do, Minh N

2014-12-01

This paper presents an efficient technique for performing a spatially inhomogeneous edge-preserving image smoothing, called fast global smoother. Focusing on sparse Laplacian matrices consisting of a data term and a prior term (typically defined using four or eight neighbors for 2D image), our approach efficiently solves such global objective functions. In particular, we approximate the solution of the memory-and computation-intensive large linear system, defined over a d-dimensional spatial domain, by solving a sequence of 1D subsystems. Our separable implementation enables applying a linear-time tridiagonal matrix algorithm to solve d three-point Laplacian matrices iteratively. Our approach combines the best of two paradigms, i.e., efficient edge-preserving filters and optimization-based smoothing. Our method has a comparable runtime to the fast edge-preserving filters, but its global optimization formulation overcomes many limitations of the local filtering approaches. Our method also achieves high-quality results as the state-of-the-art optimization-based techniques, but runs ∼10-30 times faster. Besides, considering the flexibility in defining an objective function, we further propose generalized fast algorithms that perform Lγ norm smoothing (0 < γ < 2) and support an aggregated (robust) data term for handling imprecise data constraints. We demonstrate the effectiveness and efficiency of our techniques in a range of image processing and computer graphics applications.
Real-time haptic cutting of high-resolution soft tissues.

PubMed

Wu, Jun; Westermann, Rüdiger; Dick, Christian

2014-01-01

We present our systematic efforts in advancing the computational performance of physically accurate soft tissue cutting simulation, which is at the core of surgery simulators in general. We demonstrate a real-time performance of 15 simulation frames per second for haptic soft tissue cutting of a deformable body at an effective resolution of 170,000 finite elements. This is achieved by the following innovative components: (1) a linked octree discretization of the deformable body, which allows for fast and robust topological modifications of the simulation domain, (2) a composite finite element formulation, which thoroughly reduces the number of simulation degrees of freedom and thus enables to carefully balance simulation performance and accuracy, (3) a highly efficient geometric multigrid solver for solving the linear systems of equations arising from implicit time integration, (4) an efficient collision detection algorithm that effectively exploits the composition structure, and (5) a stable haptic rendering algorithm for computing the feedback forces. Considering that our method increases the finite element resolution for physically accurate real-time soft tissue cutting simulation by an order of magnitude, our technique has a high potential to significantly advance the realism of surgery simulators.
Towards fully analog hardware reservoir computing for speech recognition

NASA Astrophysics Data System (ADS)

Smerieri, Anteo; Duport, François; Paquot, Yvan; Haelterman, Marc; Schrauwen, Benjamin; Massar, Serge

2012-09-01

Reservoir computing is a very recent, neural network inspired unconventional computation technique, where a recurrent nonlinear system is used in conjunction with a linear readout to perform complex calculations, leveraging its inherent internal dynamics. In this paper we show the operation of an optoelectronic reservoir computer in which both the nonlinear recurrent part and the readout layer are implemented in hardware for a speech recognition application. The performance obtained is close to the one of to state-of-the-art digital reservoirs, while the analog architecture opens the way to ultrafast computation.
OAO battery data analysis

NASA Technical Reports Server (NTRS)

Gaston, S.; Wertheim, M.; Orourke, J. A.

1973-01-01

Summary, consolidation and analysis of specifications, manufacturing process and test controls, and performance results for OAO-2 and OAO-3 lot 20 Amp-Hr sealed nickel cadmium cells and batteries are reported. Correlation of improvements in control requirements with performance is a key feature. Updates for a cell/battery computer model to improve performance prediction capability are included. Applicability of regression analysis computer techniques to relate process controls to performance is checked.
Production and characterization of pure cryogenic inertial fusion targets

NASA Astrophysics Data System (ADS)

Boyd, B. A.; Kamerman, G. W.

An experimental cryogenic inertial fusion target generator and two optical techniques for automated target inspection are described. The generator produces 100 microns diameter solid hydrogen spheres at a rate compatible with fueling requirements of conceptual inertial fusion power plants. A jet of liquified hydrogen is disrupted into droplets by an ultrasonically excited nozzle. The droplets solidify into microspheres while falling through a chamber maintained below the hydrogen triple point pressure. Stable operation of the generator has been demonstrated for up to three hours. The optical inspection techniques are computer aided photomicrography and coarse diffraction pattern analysis (CDPA). The photomicrography system uses a conventional microscope coupled to a computer by a solid state camera and digital image memory. The computer enhances the stored image and performs feature extraction to determine pellet parameters. The CDPA technique uses Fourier transform optics and a special detector array to perform optical processing of a target image.
Optomechanical study and optimization of cantilever plate dynamics

NASA Astrophysics Data System (ADS)

Furlong, Cosme; Pryputniewicz, Ryszard J.

1995-06-01

Optimum dynamic characteristics of an aluminum cantilever plate containing holes of different sizes and located at arbitrary positions on the plate are studied computationally and experimentally. The objective function of this optimization is the minimization/maximization of the natural frequencies of the plate in terms of such design variable s as the sizes and locations of the holes. The optimization process is performed using the finite element method and mathematical programming techniques in order to obtain the natural frequencies and the optimum conditions of the plate, respectively. The modal behavior of the resultant optimal plate layout is studied experimentally through the use of holographic interferometry techniques. Comparisons of the computational and experimental results show that good agreement between theory and test is obtained. The comparisons also show that the combined, or hybrid use of experimental and computational techniques complement each other and prove to be a very efficient tool for performing optimization studies of mechanical components.
VLSI neuroprocessors

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.

1994-01-01

Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Marc Henry de Frahan | NREL

Science.gov Websites

Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid
Estimation of the mechanical properties of the eye through the study of its vibrational modes

PubMed Central

2017-01-01

Measuring the eye’s mechanical properties in vivo and with minimally invasive techniques can be the key for individualized solutions to a number of eye pathologies. The development of such techniques largely relies on a computational modelling of the eyeball and, it optimally requires the synergic interplay between experimentation and numerical simulation. In Astrophysics and Geophysics the remote measurement of structural properties of the systems of their realm is performed on the basis of (helio-)seismic techniques. As a biomechanical system, the eyeball possesses normal vibrational modes encompassing rich information about its structure and mechanical properties. However, the integral analysis of the eyeball vibrational modes has not been performed yet. Here we develop a new finite difference method to compute both the spheroidal and, specially, the toroidal eigenfrequencies of the human eye. Using this numerical model, we show that the vibrational eigenfrequencies of the human eye fall in the interval 100 Hz–10 MHz. We find that compressible vibrational modes may release a trace on high frequency changes of the intraocular pressure, while incompressible normal modes could be registered analyzing the scattering pattern that the motions of the vitreous humour leave on the retina. Existing contact lenses with embebed devices operating at high sampling frequency could be used to register the microfluctuations of the eyeball shape we obtain. We advance that an inverse problem to obtain the mechanical properties of a given eye (e.g., Young’s modulus, Poisson ratio) measuring its normal frequencies is doable. These measurements can be done using non-invasive techniques, opening very interesting perspectives to estimate the mechanical properties of eyes in vivo. Future research might relate various ocular pathologies with anomalies in measured vibrational frequencies of the eye. PMID:28922351
Estimation of the mechanical properties of the eye through the study of its vibrational modes.

PubMed

Aloy, M Á; Adsuara, J E; Cerdá-Durán, P; Obergaulinger, M; Esteve-Taboada, J J; Ferrer-Blasco, T; Montés-Micó, R

2017-01-01

Measuring the eye's mechanical properties in vivo and with minimally invasive techniques can be the key for individualized solutions to a number of eye pathologies. The development of such techniques largely relies on a computational modelling of the eyeball and, it optimally requires the synergic interplay between experimentation and numerical simulation. In Astrophysics and Geophysics the remote measurement of structural properties of the systems of their realm is performed on the basis of (helio-)seismic techniques. As a biomechanical system, the eyeball possesses normal vibrational modes encompassing rich information about its structure and mechanical properties. However, the integral analysis of the eyeball vibrational modes has not been performed yet. Here we develop a new finite difference method to compute both the spheroidal and, specially, the toroidal eigenfrequencies of the human eye. Using this numerical model, we show that the vibrational eigenfrequencies of the human eye fall in the interval 100 Hz-10 MHz. We find that compressible vibrational modes may release a trace on high frequency changes of the intraocular pressure, while incompressible normal modes could be registered analyzing the scattering pattern that the motions of the vitreous humour leave on the retina. Existing contact lenses with embebed devices operating at high sampling frequency could be used to register the microfluctuations of the eyeball shape we obtain. We advance that an inverse problem to obtain the mechanical properties of a given eye (e.g., Young's modulus, Poisson ratio) measuring its normal frequencies is doable. These measurements can be done using non-invasive techniques, opening very interesting perspectives to estimate the mechanical properties of eyes in vivo. Future research might relate various ocular pathologies with anomalies in measured vibrational frequencies of the eye.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.