Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Recent achievements in real-time computational seismology in Taiwan
NASA Astrophysics Data System (ADS)
Lee, S.; Liang, W.; Huang, B.
2012-12-01
Real-time computational seismology is currently possible to be achieved which needs highly connection between seismic database and high performance computing. We have developed a real-time moment tensor monitoring system (RMT) by using continuous BATS records and moment tensor inversion (CMT) technique. The real-time online earthquake simulation service is also ready to open for researchers and public earthquake science education (ROS). Combine RMT with ROS, the earthquake report based on computational seismology can provide within 5 minutes after an earthquake occurred (RMT obtains point source information < 120 sec; ROS completes a 3D simulation < 3 minutes). All of these computational results are posted on the internet in real-time now. For more information, welcome to visit real-time computational seismology earthquake report webpage (RCS).
NASA Astrophysics Data System (ADS)
Wang, Yuan; Chen, Zhidong; Sang, Xinzhu; Li, Hui; Zhao, Linmin
2018-03-01
Holographic displays can provide the complete optical wave field of a three-dimensional (3D) scene, including the depth perception. However, it often takes a long computation time to produce traditional computer-generated holograms (CGHs) without more complex and photorealistic rendering. The backward ray-tracing technique is able to render photorealistic high-quality images, which noticeably reduce the computation time achieved from the high-degree parallelism. Here, a high-efficiency photorealistic computer-generated hologram method is presented based on the ray-tracing technique. Rays are parallelly launched and traced under different illuminations and circumstances. Experimental results demonstrate the effectiveness of the proposed method. Compared with the traditional point cloud CGH, the computation time is decreased to 24 s to reconstruct a 3D object of 100 ×100 rays with continuous depth change.
HPC on Competitive Cloud Resources
NASA Astrophysics Data System (ADS)
Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff
Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.
Fully integrated sub 100ps photon counting platform
NASA Astrophysics Data System (ADS)
Buckley, S. J.; Bellis, S. J.; Rosinger, P.; Jackson, J. C.
2007-02-01
Current state of the art high resolution counting modules, specifically designed for high timing resolution applications, are largely based on a computer card format. This has tended to result in a costly solution that is restricted to the computer it resides in. We describe a four channel timing module that interfaces to a computer via a USB port and operates with a resolution of less than 100 picoseconds. The core design of the system is an advanced field programmable gate array (FPGA) interfacing to a precision time interval measurement module, mass memory block and a high speed USB 2.0 serial data port. The FPGA design allows the module to operate in a number of modes allowing both continuous recording of photon events (time-tagging) and repetitive time binning. In time-tag mode the system reports, for each photon event, the high resolution time along with the chronological time (macro time) and the channel ID. The time-tags are uploaded in real time to a host computer via a high speed USB port allowing continuous storage to computer memory of up to 4 millions photons per second. In time-bin mode, binning is carried out with count rates up to 10 million photons per second. Each curve resides in a block of 128,000 time-bins each with a resolution programmable down to less than 100 picoseconds. Each bin has a limit of 65535 hits allowing autonomous curve recording until a bin reaches the maximum count or the system is commanded to halt. Due to the large memory storage, several curves/experiments can be stored in the system prior to uploading to the host computer for analysis. This makes this module ideal for integration into high timing resolution specific applications such as laser ranging and fluorescence lifetime imaging using techniques such as time correlated single photon counting (TCSPC).
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
NASA Astrophysics Data System (ADS)
Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku
2015-01-01
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.
One-Time Password Tokens | High-Performance Computing | NREL
One-Time Password Tokens One-Time Password Tokens For connecting to NREL's high-performance computing (HPC) systems, learn how to set up a one-time password (OTP) token for remote and privileged a one-time pass code from the HPC Operations team. At the sign-in screen Enter your HPC Username in
ERIC Educational Resources Information Center
Kolata, Gina
1984-01-01
Examines social influences which discourage women from pursuing studies in computer science, including monopoly of computer time by boys at the high school level, sexual harassment in college, movies, and computer games. Describes some initial efforts to encourage females of all ages to study computer science. (JM)
Real-time computational photon-counting LiDAR
NASA Astrophysics Data System (ADS)
Edgar, Matthew; Johnson, Steven; Phillips, David; Padgett, Miles
2018-03-01
The availability of compact, low-cost, and high-speed MEMS-based spatial light modulators has generated widespread interest in alternative sampling strategies for imaging systems utilizing single-pixel detectors. The development of compressed sensing schemes for real-time computational imaging may have promising commercial applications for high-performance detectors, where the availability of focal plane arrays is expensive or otherwise limited. We discuss the research and development of a prototype light detection and ranging (LiDAR) system via direct time of flight, which utilizes a single high-sensitivity photon-counting detector and fast-timing electronics to recover millimeter accuracy three-dimensional images in real time. The development of low-cost real time computational LiDAR systems could have importance for applications in security, defense, and autonomous vehicles.
2013-12-01
authors present a Computing on Dissemination with predictable contacts ( pCoD ) algorithm, since it is impossible to reserve task execution time in advance...Computing While Charging DAG Directed Acyclic Graph 18 TTL Time-to-live pCoD Predictable contacts CoD Computing on Dissemination upCoD Unpredictable
Reverse time migration by Krylov subspace reduced order modeling
NASA Astrophysics Data System (ADS)
Basir, Hadi Mahdavi; Javaherian, Abdolrahim; Shomali, Zaher Hossein; Firouz-Abadi, Roohollah Dehghani; Gholamy, Shaban Ali
2018-04-01
Imaging is a key step in seismic data processing. To date, a myriad of advanced pre-stack depth migration approaches have been developed; however, reverse time migration (RTM) is still considered as the high-end imaging algorithm. The main limitations associated with the performance cost of reverse time migration are the intensive computation of the forward and backward simulations, time consumption, and memory allocation related to imaging condition. Based on the reduced order modeling, we proposed an algorithm, which can be adapted to all the aforementioned factors. Our proposed method benefit from Krylov subspaces method to compute certain mode shapes of the velocity model computed by as an orthogonal base of reduced order modeling. Reverse time migration by reduced order modeling is helpful concerning the highly parallel computation and strongly reduces the memory requirement of reverse time migration. The synthetic model results showed that suggested method can decrease the computational costs of reverse time migration by several orders of magnitudes, compared with reverse time migration by finite element method.
Optical Interconnection Via Computer-Generated Holograms
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang; Zhou, Shaomin
1995-01-01
Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
Numerical Prediction of Pitch Damping Stability Derivatives for Finned Projectiles
2013-11-01
in part by a grant of high-performance computing time from the U.S. DOD High Performance Computing Modernization Program (HPCMP) at the Army...to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data...12 3.3.2 Time -Accurate Simulations
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
An Approach to Integrate a Space-Time GIS Data Model with High Performance Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dali; Zhao, Ziliang; Shaw, Shih-Lung
2011-01-01
In this paper, we describe an approach to integrate a Space-Time GIS data model on a high performance computing platform. The Space-Time GIS data model has been developed on a desktop computing environment. We use the Space-Time GIS data model to generate GIS module, which organizes a series of remote sensing data. We are in the process of porting the GIS module into an HPC environment, in which the GIS modules handle large dataset directly via parallel file system. Although it is an ongoing project, authors hope this effort can inspire further discussions on the integration of GIS on highmore » performance computing platforms.« less
PACE 2: Pricing and Cost Estimating Handbook
NASA Technical Reports Server (NTRS)
Stewart, R. D.; Shepherd, T.
1977-01-01
An automatic data processing system to be used for the preparation of industrial engineering type manhour and material cost estimates has been established. This computer system has evolved into a highly versatile and highly flexible tool which significantly reduces computation time, eliminates computational errors, and reduces typing and reproduction time for estimators and pricers since all mathematical and clerical functions are automatic once basic inputs are derived.
Colt: an experiment in wormhole run-time reconfiguration
NASA Astrophysics Data System (ADS)
Bittner, Ray; Athanas, Peter M.; Musgrove, Mark
1996-10-01
Wormhole run-time reconfiguration (RTR) is an attempt to create a refined computing paradigm for high performance computational tasks. By combining concepts from field programmable gate array (FPGA) technologies with data flow computing, the Colt/Stallion architecture achieves high utilization of hardware resources, and facilitates rapid run-time reconfiguration. Targeted mainly at DSP-type operations, the Colt integrated circuit -- a prototype wormhole RTR device -- compares favorably to contemporary DSP alternatives in terms of silicon area consumed per unit computation and in computing performance. Although emphasis has been placed on signal processing applications, general purpose computation has not been overlooked. Colt is a prototype that defines an architecture not only at the chip level but also in terms of an overall system design. As this system is realized, the concept of wormhole RTR will be applied to numerical computation and DSP applications including those common to image processing, communications systems, digital filters, acoustic processing, real-time control systems and simulation acceleration.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Screen Time at Home and School among Low-Income Children Attending Head Start
Fletcher, Erica N.; Whitaker, Robert C.; Marino, Alexis J.; Anderson, Sarah E.
2013-01-01
Objective To describe the patterns of screen viewing at home and school among low-income preschool-aged children attending Head Start and identify factors associated with high home screen time in this population. Few studies have examined both home and classroom screen time, or included computer use as a component of screen viewing. Methods Participants were 2221 low-income preschool-aged children in the United States studied in the Head Start Family and Child Experiences Survey (FACES) in spring 2007. For 5 categories of screen viewing (television, video/DVD, video games, computer games, other computer use), we assessed children’s typical weekday home (parent-reported) and classroom (teacher-reported) screen viewing in relation to having a television in the child’s bedroom and sociodemographic factors. Results Over half of children (55.7%) had a television in their bedroom, and 12.5% had high home screen time (>4 hours/weekday). Television was the most common category of home screen time, but 56.6% of children had access to a computer at home and 37.5% had used it on the last typical weekday. After adjusting for sociodemographic characteristics, children with a television in their bedroom were more likely to have high home screen time [odds ratio=2.57 (95% confidence interval: 1.80–3.68)]. Classroom screen time consisted almost entirely of computer use; 49.4% of children used a classroom computer for ≥1 hour/week, and 14.2% played computer games at school ≥5 hours/week. Conclusions In 2007, one in eight low-income children attending Head Start had >4 hours/weekday of home screen time, which was associated with having a television in the bedroom. In the Head Start classroom, television and video viewing were uncommon but computer use was common. PMID:24891924
A High Performance Computer Architecture for Embedded And/Or Multi-Computer Applications
1990-09-01
commercially available, real - time operating system . CHOICES and ARTS are real-time operating systems developed at the University of Illinois and CMU...respectively. Selection of a real - time operating system will be made in the next phase of the project. U BIBLIOGRAPHY U Wulf, Wm. A. The WM Computer
An evaluation of superminicomputers for thermal analysis
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Vidal, J. B.; Jones, G. K.
1962-01-01
The feasibility and cost effectiveness of solving thermal analysis problems on superminicomputers is demonstrated. Conventional thermal analysis and the changing computer environment, computer hardware and software used, six thermal analysis test problems, performance of superminicomputers (CPU time, accuracy, turnaround, and cost) and comparison with large computers are considered. Although the CPU times for superminicomputers were 15 to 30 times greater than the fastest mainframe computer, the minimum cost to obtain the solutions on superminicomputers was from 11 percent to 59 percent of the cost of mainframe solutions. The turnaround (elapsed) time is highly dependent on the computer load, but for large problems, superminicomputers produced results in less elapsed time than a typically loaded mainframe computer.
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
NASA Technical Reports Server (NTRS)
Bailey, David H.; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
With programs such as the US High Performance Computing and Communications Program (HPCCP), the attention of scientists and engineers worldwide has been focused on the potential of very high performance scientific computing, namely systems that are hundreds or thousands of times more powerful than those typically available in desktop systems at any given point in time. Extending the frontiers of computing in this manner has resulted in remarkable advances, both in computing technology itself and also in the various scientific and engineering disciplines that utilize these systems. Within the month or two, a sustained rate of 1 Tflop/s (also written 1 teraflops, or 10(exp 12) floating-point operations per second) is likely to be achieved by the 'ASCI Red' system at Sandia National Laboratory in New Mexico. With this objective in sight, it is reasonable to ask what lies ahead for high-end computing.
Optimisation of multiplet identifier processing on a PLAYSTATION® 3
NASA Astrophysics Data System (ADS)
Hattori, Masami; Mizuno, Takashi
2010-02-01
To enable high-performance computing (HPC) for applications with large datasets using a Sony® PLAYSTATION® 3 (PS3™) video game console, we configured a hybrid system consisting of a Windows® PC and a PS3™. To validate this system, we implemented the real-time multiplet identifier (RTMI) application, which identifies multiplets of microearthquakes in terms of the similarity of their waveforms. The cross-correlation computation, which is a core algorithm of the RTMI application, was optimised for the PS3™ platform, while the rest of the computation, including data input and output remained on the PC. With this configuration, the core part of the algorithm ran 69 times faster than the original program, accelerating total computation speed more than five times. As a result, the system processed up to 2100 total microseismic events, whereas the original implementation had a limit of 400 events. These results indicate that this system enables high-performance computing for large datasets using the PS3™, as long as data transfer time is negligible compared with computation time.
NASA Technical Reports Server (NTRS)
Sackett, L. L.; Edelbaum, T. N.; Malchow, H. L.
1974-01-01
This manual is a guide for using a computer program which calculates time optimal trajectories for high-and low-thrust geocentric transfers. Either SEP or NEP may be assumed and a one or two impulse, fixed total delta V, initial high thrust phase may be included. Also a single impulse of specified delta V may be included after the low thrust state. The low thrust phase utilizes equinoctial orbital elements to avoid the classical singularities and Kryloff-Boguliuboff averaging to help insure more rapid computation time. The program is written in FORTRAN 4 in double precision for use on an IBM 360 computer. The manual includes a description of the problem treated, input/output information, examples of runs, and source code listings.
NASA Technical Reports Server (NTRS)
Bechtel, R. D.; Mateos, M. A.; Lincoln, K. A.
1988-01-01
Briefly described are the essential features of a computer program designed to interface a personal computer with the fast, digital data acquisition system of a time-of-flight mass spectrometer. The instrumentation was developed to provide a time-resolved analysis of individual vapor pulses produced by the incidence of a pulsed laser beam on an ablative material. The high repetition rate spectrometer coupled to a fast transient recorder captures complete mass spectra every 20 to 35 microsecs, thereby providing the time resolution needed for the study of this sort of transient event. The program enables the computer to record the large amount of data generated by the system in short time intervals, and it provides the operator the immediate option of presenting the spectral data in several different formats. Furthermore, the system does this with a high degree of automation, including the tasks of mass labeling the spectra and logging pertinent instrumental parameters.
NASA Technical Reports Server (NTRS)
Chang, Chau-Lyan; Venkatachari, Balaji Shankar; Cheng, Gary
2013-01-01
With the wide availability of affordable multiple-core parallel supercomputers, next generation numerical simulations of flow physics are being focused on unsteady computations for problems involving multiple time scales and multiple physics. These simulations require higher solution accuracy than most algorithms and computational fluid dynamics codes currently available. This paper focuses on the developmental effort for high-fidelity multi-dimensional, unstructured-mesh flow solvers using the space-time conservation element, solution element (CESE) framework. Two approaches have been investigated in this research in order to provide high-accuracy, cross-cutting numerical simulations for a variety of flow regimes: 1) time-accurate local time stepping and 2) highorder CESE method. The first approach utilizes consistent numerical formulations in the space-time flux integration to preserve temporal conservation across the cells with different marching time steps. Such approach relieves the stringent time step constraint associated with the smallest time step in the computational domain while preserving temporal accuracy for all the cells. For flows involving multiple scales, both numerical accuracy and efficiency can be significantly enhanced. The second approach extends the current CESE solver to higher-order accuracy. Unlike other existing explicit high-order methods for unstructured meshes, the CESE framework maintains a CFL condition of one for arbitrarily high-order formulations while retaining the same compact stencil as its second-order counterpart. For large-scale unsteady computations, this feature substantially enhances numerical efficiency. Numerical formulations and validations using benchmark problems are discussed in this paper along with realistic examples.
ERIC Educational Resources Information Center
Ates, Alev; Altunay, Ugur; Altun, Eralp
2006-01-01
The aim of this research was to discern the effects of computer assisted English instruction on English language preparatory students' attitudes towards computers and English in a Turkish-medium high school with an intensive English program. A quasi-experimental time series research design, also called "before-after" or "repeated…
GPUs: An Emerging Platform for General-Purpose Computation
2007-08-01
programming; real-time cinematic quality graphics Peak stream (26) License required (limited time no- cost evaluation program) Commercially...folding.stanford.edu (accessed 30 March 2007). 2. Fan, Z.; Qiu, F.; Kaufman, A.; Yoakum-Stover, S. GPU Cluster for High Performance Computing. ACM/IEEE...accessed 30 March 2007). 8. Goodnight, N.; Wang, R.; Humphreys, G. Computation on Programmable Graphics Hardware. IEEE Computer Graphics and
A real-time spike sorting method based on the embedded GPU.
Zelan Yang; Kedi Xu; Xiang Tian; Shaomin Zhang; Xiaoxiang Zheng
2017-07-01
Microelectrode arrays with hundreds of channels have been widely used to acquire neuron population signals in neuroscience studies. Online spike sorting is becoming one of the most important challenges for high-throughput neural signal acquisition systems. Graphic processing unit (GPU) with high parallel computing capability might provide an alternative solution for increasing real-time computational demands on spike sorting. This study reported a method of real-time spike sorting through computing unified device architecture (CUDA) which was implemented on an embedded GPU (NVIDIA JETSON Tegra K1, TK1). The sorting approach is based on the principal component analysis (PCA) and K-means. By analyzing the parallelism of each process, the method was further optimized in the thread memory model of GPU. Our results showed that the GPU-based classifier on TK1 is 37.92 times faster than the MATLAB-based classifier on PC while their accuracies were the same with each other. The high-performance computing features of embedded GPU demonstrated in our studies suggested that the embedded GPU provide a promising platform for the real-time neural signal processing.
Symplectic molecular dynamics simulations on specially designed parallel computers.
Borstnik, Urban; Janezic, Dusanka
2005-01-01
We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
NASA Astrophysics Data System (ADS)
Stockert, Sven; Wehr, Matthias; Lohmar, Johannes; Abel, Dirk; Hirt, Gerhard
2017-10-01
In the electrical and medical industries the trend towards further miniaturization of devices is accompanied by the demand for smaller manufacturing tolerances. Such industries use a plentitude of small and narrow cold rolled metal strips with high thickness accuracy. Conventional rolling mills can hardly achieve further improvement of these tolerances. However, a model-based controller in combination with an additional piezoelectric actuator for high dynamic roll adjustment is expected to enable the production of the required metal strips with a thickness tolerance of +/-1 µm. The model-based controller has to be based on a rolling theory which can describe the rolling process very accurately. Additionally, the required computing time has to be low in order to predict the rolling process in real-time. In this work, four rolling theories from literature with different levels of complexity are tested for their suitability for the predictive controller. Rolling theories of von Kármán, Siebel, Bland & Ford and Alexander are implemented in Matlab and afterwards transferred to the real-time computer used for the controller. The prediction accuracy of these theories is validated using rolling trials with different thickness reduction and a comparison to the calculated results. Furthermore, the required computing time on the real-time computer is measured. Adequate results according the prediction accuracy can be achieved with the rolling theories developed by Bland & Ford and Alexander. A comparison of the computing time of those two theories reveals that Alexander's theory exceeds the sample rate of 1 kHz of the real-time computer.
Particle-In-Cell simulations of high pressure plasmas using graphics processing units
NASA Astrophysics Data System (ADS)
Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter
2009-10-01
Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
Hardware architecture design of image restoration based on time-frequency domain computation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Jing; Jiao, Zipeng
2013-10-01
The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
The Overdominance of Computers
ERIC Educational Resources Information Center
Monke, Lowell W.
2006-01-01
Most schools are unwilling to consider decreasing computer use at school because they fear that without screen time, students will not be prepared for the demands of a high-tech 21st century. Monke argues that having young children spend a significant amount of time on computers in school is harmful, particularly when children spend so much…
Body image dissatisfaction, physical activity and screen-time in Spanish adolescents.
Añez, Elizabeth; Fornieles-Deu, Albert; Fauquet-Ars, Jordi; López-Guimerà, Gemma; Puntí-Vidal, Joaquim; Sánchez-Carracedo, David
2018-01-01
This cross-sectional study contributes to the literature on whether body dissatisfaction is a barrier/facilitator to engaging in physical activity and to investigate the impact of mass-media messages via computer-time on body dissatisfaction. High-school students ( N = 1501) reported their physical activity, computer-time (homework/leisure) and body dissatisfaction. Researchers measured students' weight and height. Analyses revealed that body dissatisfaction was negatively associated with physical activity on both genders, whereas computer-time was associated only with girls' body dissatisfaction. Specifically, as computer-homework increased, body dissatisfaction decreased; as computer-leisure increased, body dissatisfaction increased. Weight-related interventions should improve body image and physical activity simultaneously, while critical consumption of mass-media interventions should include a computer component.
Scaling predictive modeling in drug development with cloud computing.
Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola
2015-01-26
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
A single-stage flux-corrected transport algorithm for high-order finite-volume methods
Chaplin, Christopher; Colella, Phillip
2017-05-08
We present a new limiter method for solving the advection equation using a high-order, finite-volume discretization. The limiter is based on the flux-corrected transport algorithm. Here, we modify the classical algorithm by introducing a new computation for solution bounds at smooth extrema, as well as improving the preconstraint on the high-order fluxes. We compute the high-order fluxes via a method-of-lines approach with fourth-order Runge-Kutta as the time integrator. For computing low-order fluxes, we select the corner-transport upwind method due to its improved stability over donor-cell upwind. Several spatial differencing schemes are investigated for the high-order flux computation, including centered- differencemore » and upwind schemes. We show that the upwind schemes perform well on account of the dissipation of high-wavenumber components. The new limiter method retains high-order accuracy for smooth solutions and accurately captures fronts in discontinuous solutions. Further, we need only apply the limiter once per complete time step.« less
ALMA Correlator Real-Time Data Processor
NASA Astrophysics Data System (ADS)
Pisano, J.; Amestica, R.; Perez, J.
2005-10-01
The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
NASA Technical Reports Server (NTRS)
Chaderjian, Neal M.
1991-01-01
Computations from two Navier-Stokes codes, NSS and F3D, are presented for a tangent-ogive-cylinder body at high angle of attack. Features of this steady flow include a pair of primary vortices on the leeward side of the body as well as secondary vortices. The topological and physical plausibility of this vortical structure is discussed. The accuracy of these codes are assessed by comparison of the numerical solutions with experimental data. The effects of turbulence model, numerical dissipation, and grid refinement are presented. The overall efficiency of these codes are also assessed by examining their convergence rates, computational time per time step, and maximum allowable time step for time-accurate computations. Overall, the numerical results from both codes compared equally well with experimental data, however, the NSS code was found to be significantly more efficient than the F3D code.
2015-09-30
Clark (2014), "Using High Performance Computing to Explore Large Complex Bioacoustic Soundscapes : Case Study for Right Whale Acoustics," Procedia...34Using High Performance Computing to Explore Large Complex Bioacoustic Soundscapes : Case Study for Right Whale Acoustics," Procedia Computer Science 20
NASA Astrophysics Data System (ADS)
Ahn, Sul-Ah; Jung, Youngim
2016-10-01
The research activities of the computational physicists utilizing high performance computing are analyzed by bibliometirc approaches. This study aims at providing the computational physicists utilizing high-performance computing and policy planners with useful bibliometric results for an assessment of research activities. In order to achieve this purpose, we carried out a co-authorship network analysis of journal articles to assess the research activities of researchers for high-performance computational physics as a case study. For this study, we used journal articles of the Scopus database from Elsevier covering the time period of 2004-2013. We extracted the author rank in the physics field utilizing high-performance computing by the number of papers published during ten years from 2004. Finally, we drew the co-authorship network for 45 top-authors and their coauthors, and described some features of the co-authorship network in relation to the author rank. Suggestions for further studies are discussed.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
A high-order multiscale finite-element method for time-domain acoustic-wave modeling
NASA Astrophysics Data System (ADS)
Gao, Kai; Fu, Shubin; Chung, Eric T.
2018-05-01
Accurate and efficient wave equation modeling is vital for many applications in such as acoustics, electromagnetics, and seismology. However, solving the wave equation in large-scale and highly heterogeneous models is usually computationally expensive because the computational cost is directly proportional to the number of grids in the model. We develop a novel high-order multiscale finite-element method to reduce the computational cost of time-domain acoustic-wave equation numerical modeling by solving the wave equation on a coarse mesh based on the multiscale finite-element theory. In contrast to existing multiscale finite-element methods that use only first-order multiscale basis functions, our new method constructs high-order multiscale basis functions from local elliptic problems which are closely related to the Gauss-Lobatto-Legendre quadrature points in a coarse element. Essentially, these basis functions are not only determined by the order of Legendre polynomials, but also by local medium properties, and therefore can effectively convey the fine-scale information to the coarse-scale solution with high-order accuracy. Numerical tests show that our method can significantly reduce the computation time while maintain high accuracy for wave equation modeling in highly heterogeneous media by solving the corresponding discrete system only on the coarse mesh with the new high-order multiscale basis functions.
A high-order multiscale finite-element method for time-domain acoustic-wave modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Kai; Fu, Shubin; Chung, Eric T.
Accurate and efficient wave equation modeling is vital for many applications in such as acoustics, electromagnetics, and seismology. However, solving the wave equation in large-scale and highly heterogeneous models is usually computationally expensive because the computational cost is directly proportional to the number of grids in the model. We develop a novel high-order multiscale finite-element method to reduce the computational cost of time-domain acoustic-wave equation numerical modeling by solving the wave equation on a coarse mesh based on the multiscale finite-element theory. In contrast to existing multiscale finite-element methods that use only first-order multiscale basis functions, our new method constructsmore » high-order multiscale basis functions from local elliptic problems which are closely related to the Gauss–Lobatto–Legendre quadrature points in a coarse element. Essentially, these basis functions are not only determined by the order of Legendre polynomials, but also by local medium properties, and therefore can effectively convey the fine-scale information to the coarse-scale solution with high-order accuracy. Numerical tests show that our method can significantly reduce the computation time while maintain high accuracy for wave equation modeling in highly heterogeneous media by solving the corresponding discrete system only on the coarse mesh with the new high-order multiscale basis functions.« less
A high-order multiscale finite-element method for time-domain acoustic-wave modeling
Gao, Kai; Fu, Shubin; Chung, Eric T.
2018-02-04
Accurate and efficient wave equation modeling is vital for many applications in such as acoustics, electromagnetics, and seismology. However, solving the wave equation in large-scale and highly heterogeneous models is usually computationally expensive because the computational cost is directly proportional to the number of grids in the model. We develop a novel high-order multiscale finite-element method to reduce the computational cost of time-domain acoustic-wave equation numerical modeling by solving the wave equation on a coarse mesh based on the multiscale finite-element theory. In contrast to existing multiscale finite-element methods that use only first-order multiscale basis functions, our new method constructsmore » high-order multiscale basis functions from local elliptic problems which are closely related to the Gauss–Lobatto–Legendre quadrature points in a coarse element. Essentially, these basis functions are not only determined by the order of Legendre polynomials, but also by local medium properties, and therefore can effectively convey the fine-scale information to the coarse-scale solution with high-order accuracy. Numerical tests show that our method can significantly reduce the computation time while maintain high accuracy for wave equation modeling in highly heterogeneous media by solving the corresponding discrete system only on the coarse mesh with the new high-order multiscale basis functions.« less
The Case for Modular Redundancy in Large-Scale High Performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Ong, Hong Hoe; Scott, Stephen L
2009-01-01
Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to failure (MTTF) and a higher mean-time to recover (MTTR) than their predecessors. Modular redundancy is being used in many mission critical systems today to provide for resilience, such as for aerospace and command \\& control systems. The primary argument against modular redundancy for resilience in HPC has always been that the capability of a HPC system, and respective return on investment, would be significantly reduced. We argue that modular redundancy can significantly increasemore » compute node availability as it removes the impact of scale from single compute node MTTR. We further argue that single compute nodes can be much less reliable, and therefore less expensive, and still be highly available, if their MTTR/MTTF ratio is maintained.« less
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-01-01
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-02-12
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
P2P Technology for High-Performance Computing: An Overview
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Berry, Jason
2003-01-01
The transition from cluster computing to peer-to-peer (P2P) high-performance computing has recently attracted the attention of the computer science community. It has been recognized that existing local networks and dedicated clusters of headless workstations can serve as inexpensive yet powerful virtual supercomputers. It has also been recognized that the vast number of lower-end computers connected to the Internet stay idle for as long as 90% of the time. The growing speed of Internet connections and the high availability of free CPU time encourage exploration of the possibility to use the whole Internet rather than local clusters as massively parallel yet almost freely available P2P supercomputer. As a part of a larger project on P2P high-performance computing, it has been my goal to compile an overview of the 2P2 paradigm. I have studied various P2P platforms and I have compiled systematic brief descriptions of their most important characteristics. I have also experimented and obtained hands-on experience with selected P2P platforms focusing on those that seem promising with respect to P2P high-performance computing. I have also compiled relevant literature and web references. I have prepared a draft technical report and I have summarized my findings in a poster paper.
NASA Astrophysics Data System (ADS)
Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir
2016-12-01
The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
An assessment of the real-time application capabilities of the SIFT computer system
NASA Technical Reports Server (NTRS)
Butler, R. W.
1982-01-01
The real-time capabilities of the SIFT computer system, a highly reliable multicomputer architecture developed to support the flight controls of a relaxed static stability aircraft, are discussed. The SIFT computer system was designed to meet extremely high reliability requirements and to facilitate a formal proof of its correctness. Although SIFT represents a significant achievement in fault-tolerant system research it presents an unusual and restrictive interface to its users. The characteristics of the user interface and its impact on application system design are assessed.
Real-time, high frequency QRS electrocardiograph
NASA Technical Reports Server (NTRS)
Schlegel, Todd T. (Inventor); DePalma, Jude L. (Inventor); Moradi, Saeed (Inventor)
2006-01-01
Real time cardiac electrical data are received from a patient, manipulated to determine various useful aspects of the ECG signal, and displayed in real time in a useful form on a computer screen or monitor. The monitor displays the high frequency data from the QRS complex in units of microvolts, juxtaposed with a display of conventional ECG data in units of millivolts or microvolts. The high frequency data are analyzed for their root mean square (RMS) voltage values and the discrete RMS values and related parameters are displayed in real time. The high frequency data from the QRS complex are analyzed with imbedded algorithms to determine the presence or absence of reduced amplitude zones, referred to herein as RAZs. RAZs are displayed as go, no-go signals on the computer monitor. The RMS and related values of the high frequency components are displayed as time varying signals, and the presence or absence of RAZs may be similarly displayed over time.
Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D
2008-08-01
The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.
Computational Burden Resulting from Image Recognition of High Resolution Radar Sensors
López-Rodríguez, Patricia; Fernández-Recio, Raúl; Bravo, Ignacio; Gardel, Alfredo; Lázaro, José L.; Rufo, Elena
2013-01-01
This paper presents a methodology for high resolution radar image generation and automatic target recognition emphasizing the computational cost involved in the process. In order to obtain focused inverse synthetic aperture radar (ISAR) images certain signal processing algorithms must be applied to the information sensed by the radar. From actual data collected by radar the stages and algorithms needed to obtain ISAR images are revised, including high resolution range profile generation, motion compensation and ISAR formation. Target recognition is achieved by comparing the generated set of actual ISAR images with a database of ISAR images generated by electromagnetic software. High resolution radar image generation and target recognition processes are burdensome and time consuming, so to determine the most suitable implementation platform the analysis of the computational complexity is of great interest. To this end and since target identification must be completed in real time, computational burden of both processes the generation and comparison with a database is explained separately. Conclusions are drawn about implementation platforms and calculation efficiency in order to reduce time consumption in a possible future implementation. PMID:23609804
Computational burden resulting from image recognition of high resolution radar sensors.
López-Rodríguez, Patricia; Fernández-Recio, Raúl; Bravo, Ignacio; Gardel, Alfredo; Lázaro, José L; Rufo, Elena
2013-04-22
This paper presents a methodology for high resolution radar image generation and automatic target recognition emphasizing the computational cost involved in the process. In order to obtain focused inverse synthetic aperture radar (ISAR) images certain signal processing algorithms must be applied to the information sensed by the radar. From actual data collected by radar the stages and algorithms needed to obtain ISAR images are revised, including high resolution range profile generation, motion compensation and ISAR formation. Target recognition is achieved by comparing the generated set of actual ISAR images with a database of ISAR images generated by electromagnetic software. High resolution radar image generation and target recognition processes are burdensome and time consuming, so to determine the most suitable implementation platform the analysis of the computational complexity is of great interest. To this end and since target identification must be completed in real time, computational burden of both processes the generation and comparison with a database is explained separately. Conclusions are drawn about implementation platforms and calculation efficiency in order to reduce time consumption in a possible future implementation.
NASA Astrophysics Data System (ADS)
MacDonald, Christopher L.; Bhattacharya, Nirupama; Sprouse, Brian P.; Silva, Gabriel A.
2015-09-01
Computing numerical solutions to fractional differential equations can be computationally intensive due to the effect of non-local derivatives in which all previous time points contribute to the current iteration. In general, numerical approaches that depend on truncating part of the system history while efficient, can suffer from high degrees of error and inaccuracy. Here we present an adaptive time step memory method for smooth functions applied to the Grünwald-Letnikov fractional diffusion derivative. This method is computationally efficient and results in smaller errors during numerical simulations. Sampled points along the system's history at progressively longer intervals are assumed to reflect the values of neighboring time points. By including progressively fewer points backward in time, a temporally 'weighted' history is computed that includes contributions from the entire past of the system, maintaining accuracy, but with fewer points actually calculated, greatly improving computational efficiency.
An efficient two-stage approach for image-based FSI analysis of atherosclerotic arteries
Rayz, Vitaliy L.; Mofrad, Mohammad R. K.; Saloner, David
2010-01-01
Patient-specific biomechanical modeling of atherosclerotic arteries has the potential to aid clinicians in characterizing lesions and determining optimal treatment plans. To attain high levels of accuracy, recent models use medical imaging data to determine plaque component boundaries in three dimensions, and fluid–structure interaction is used to capture mechanical loading of the diseased vessel. As the plaque components and vessel wall are often highly complex in shape, constructing a suitable structured computational mesh is very challenging and can require a great deal of time. Models based on unstructured computational meshes require relatively less time to construct and are capable of accurately representing plaque components in three dimensions. These models unfortunately require additional computational resources and computing time for accurate and meaningful results. A two-stage modeling strategy based on unstructured computational meshes is proposed to achieve a reasonable balance between meshing difficulty and computational resource and time demand. In this method, a coarsegrained simulation of the full arterial domain is used to guide and constrain a fine-scale simulation of a smaller region of interest within the full domain. Results for a patient-specific carotid bifurcation model demonstrate that the two-stage approach can afford a large savings in both time for mesh generation and time and resources needed for computation. The effects of solid and fluid domain truncation were explored, and were shown to minimally affect accuracy of the stress fields predicted with the two-stage approach. PMID:19756798
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Gore, Brooklin
2018-02-01
This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
A Study of Quality of Service Communication for High-Speed Packet-Switching Computer Sub-Networks
NASA Technical Reports Server (NTRS)
Cui, Zhenqian
1999-01-01
With the development of high-speed networking technology, computer networks, including local-area networks (LANs), wide-area networks (WANs) and the Internet, are extending their traditional roles of carrying computer data. They are being used for Internet telephony, multimedia applications such as conferencing and video on demand, distributed simulations, and other real-time applications. LANs are even used for distributed real-time process control and computing as a cost-effective approach. Differing from traditional data transfer, these new classes of high-speed network applications (video, audio, real-time process control, and others) are delay sensitive. The usefulness of data depends not only on the correctness of received data, but also the time that data are received. In other words, these new classes of applications require networks to provide guaranteed services or quality of service (QoS). Quality of service can be defined by a set of parameters and reflects a user's expectation about the underlying network's behavior. Traditionally, distinct services are provided by different kinds of networks. Voice services are provided by telephone networks, video services are provided by cable networks, and data transfer services are provided by computer networks. A single network providing different services is called an integrated-services network.
Time-dependent jet flow and noise computations
NASA Technical Reports Server (NTRS)
Berman, C. H.; Ramos, J. I.; Karniadakis, G. E.; Orszag, S. A.
1990-01-01
Methods for computing jet turbulence noise based on the time-dependent solution of Lighthill's (1952) differential equation are demonstrated. A key element in this approach is a flow code for solving the time-dependent Navier-Stokes equations at relatively high Reynolds numbers. Jet flow results at Re = 10,000 are presented here. This code combines a computationally efficient spectral element technique and a new self-consistent turbulence subgrid model to supply values for Lighthill's turbulence noise source tensor.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Linear static structural and vibration analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.
1993-01-01
Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Reducing the Time and Cost of Testing Engines
NASA Technical Reports Server (NTRS)
2004-01-01
Producing a new aircraft engine currently costs approximately $1 billion, with 3 years of development time for a commercial engine and 10 years for a military engine. The high development time and cost make it extremely difficult to transition advanced technologies for cleaner, quieter, and more efficient new engines. To reduce this time and cost, NASA created a vision for the future where designers would use high-fidelity computer simulations early in the design process in order to resolve critical design issues before building the expensive engine hardware. To accomplish this vision, NASA's Glenn Research Center initiated a collaborative effort with the aerospace industry and academia to develop its Numerical Propulsion System Simulation (NPSS), an advanced engineering environment for the analysis and design of aerospace propulsion systems and components. Partners estimate that using NPSS has the potential to dramatically reduce the time, effort, and expense necessary to design and test jet engines by generating sophisticated computer simulations of an aerospace object or system. These simulations will permit an engineer to test various design options without having to conduct costly and time-consuming real-life tests. By accelerating and streamlining the engine system design analysis and test phases, NPSS facilitates bringing the final product to market faster. NASA's NPSS Version (V)1.X effort was a task within the Agency s Computational Aerospace Sciences project of the High Performance Computing and Communication program, which had a mission to accelerate the availability of high-performance computing hardware and software to the U.S. aerospace community for its use in design processes. The technology brings value back to NASA by improving methods of analyzing and testing space transportation components.
A GPU-based incompressible Navier-Stokes solver on moving overset grids
NASA Astrophysics Data System (ADS)
Chandar, Dominic D. J.; Sitaraman, Jayanarayanan; Mavriplis, Dimitri J.
2013-07-01
In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, graphics processing units (GPUs) which were originally intended for gaming applications are currently being used to accelerate computational fluid dynamics (CFD) codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem to be favourable for many high-resolution computations. One such computation that involves a lot of number crunching is computing time accurate flow solutions past moving bodies. The aim of the present paper is thus to discuss the development of a flow solver on unstructured and overset grids and its implementation on GPUs. In its present form, the flow solver solves the incompressible fluid flow equations on unstructured/hybrid/overset grids using a fully implicit projection method. The resulting discretised equations are solved using a matrix-free Krylov solver using several GPU kernels such as gradient, Laplacian and reduction. Some of the simple arithmetic vector calculations are implemented using the CU++: An Object Oriented Framework for Computational Fluid Dynamics Applications using Graphics Processing Units, Journal of Supercomputing, 2013, doi:10.1007/s11227-013-0985-9 approach where GPU kernels are automatically generated at compile time. Results are presented for two- and three-dimensional computations on static and moving grids.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Radenski, Atanas
2003-01-01
The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.
Specialized computer architectures for computational aerodynamics
NASA Technical Reports Server (NTRS)
Stevenson, D. K.
1978-01-01
In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.
An Introduction to Computing: Content for a High School Course.
ERIC Educational Resources Information Center
Rogers, Jean B.
A general outline of the topics that might be covered in a computers and computing course for high school students is provided. Topics are listed in the order in which they should be taught, and the relative amount of time to be spent on each topic is suggested. Seven units are included in the course outline: (1) general introduction, (2) using…
Hardware design and implementation of fast DOA estimation method based on multicore DSP
NASA Astrophysics Data System (ADS)
Guo, Rui; Zhao, Yingxiao; Zhang, Yue; Lin, Qianqiang; Chen, Zengping
2016-10-01
In this paper, we present a high-speed real-time signal processing hardware platform based on multicore digital signal processor (DSP). The real-time signal processing platform shows several excellent characteristics including high performance computing, low power consumption, large-capacity data storage and high speed data transmission, which make it able to meet the constraint of real-time direction of arrival (DOA) estimation. To reduce the high computational complexity of DOA estimation algorithm, a novel real-valued MUSIC estimator is used. The algorithm is decomposed into several independent steps and the time consumption of each step is counted. Based on the statistics of the time consumption, we present a new parallel processing strategy to distribute the task of DOA estimation to different cores of the real-time signal processing hardware platform. Experimental results demonstrate that the high processing capability of the signal processing platform meets the constraint of real-time direction of arrival (DOA) estimation.
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems
Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-01-01
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems. PMID:29439442
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems.
Ma, Xingpo; Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-02-10
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data are processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems.
NASA Technical Reports Server (NTRS)
Divito, Ben L.; Butler, Ricky W.; Caldwell, James L.
1990-01-01
A high-level design is presented for a reliable computing platform for real-time control applications. Design tradeoffs and analyses related to the development of the fault-tolerant computing platform are discussed. The architecture is formalized and shown to satisfy a key correctness property. The reliable computing platform uses replicated processors and majority voting to achieve fault tolerance. Under the assumption of a majority of processors working in each frame, it is shown that the replicated system computes the same results as a single processor system not subject to failures. Sufficient conditions are obtained to establish that the replicated system recovers from transient faults within a bounded amount of time. Three different voting schemes are examined and proved to satisfy the bounded recovery time conditions.
Computer use at work is associated with self-reported depressive and anxiety disorder.
Kim, Taeshik; Kang, Mo-Yeol; Yoo, Min-Sang; Lee, Dongwook; Hong, Yun-Chul
2016-01-01
With the development of technology, extensive use of computers in the workplace is prevalent and increases efficiency. However, computer users are facing new harmful working conditions with high workloads and longer hours. This study aimed to investigate the association between computer use at work and self-reported depressive and anxiety disorder (DAD) in a nationally representative sample of South Korean workers. This cross-sectional study was based on the third Korean Working Conditions Survey (2011), and 48,850 workers were analyzed. Information about computer use and DAD was obtained from a self-administered questionnaire. We investigated the relation between computer use at work and DAD using logistic regression. The 12-month prevalence of DAD in computer-using workers was 1.46 %. After adjustment for socio-demographic factors, the odds ratio for DAD was higher in workers using computers more than 75 % of their workday (OR 1.69, 95 % CI 1.30-2.20) than in workers using computers less than 50 % of their shift. After stratifying by working hours, computer use for over 75 % of the work time was significantly associated with increased odds of DAD in 20-39, 41-50, 51-60, and over 60 working hours per week. After stratifying by occupation, education, and job status, computer use for more than 75 % of the work time was related with higher odds of DAD in sales and service workers, those with high school and college education, and those who were self-employed and employers. A high proportion of computer use at work may be associated with depressive and anxiety disorder. This finding suggests the necessity of a work guideline to help the workers suffering from high computer use at work.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Renke; Jin, Shuangshuang; Chen, Yousu
This paper presents a faster-than-real-time dynamic simulation software package that is designed for large-size power system dynamic simulation. It was developed on the GridPACKTM high-performance computing (HPC) framework. The key features of the developed software package include (1) faster-than-real-time dynamic simulation for a WECC system (17,000 buses) with different types of detailed generator, controller, and relay dynamic models, (2) a decoupled parallel dynamic simulation algorithm with optimized computation architecture to better leverage HPC resources and technologies, (3) options for HPC-based linear and iterative solvers, (4) hidden HPC details, such as data communication and distribution, to enable development centered on mathematicalmore » models and algorithms rather than on computational details for power system researchers, and (5) easy integration of new dynamic models and related algorithms into the software package.« less
NASA's 3D Flight Computer for Space Applications
NASA Technical Reports Server (NTRS)
Alkalai, Leon
2000-01-01
The New Millennium Program (NMP) Integrated Product Development Team (IPDT) for Microelectronics Systems was planning to validate a newly developed 3D Flight Computer system on its first deep-space flight, DS1, launched in October 1998. This computer, developed in the 1995-97 time frame, contains many new computer technologies previously never used in deep-space systems. They include: advanced 3D packaging architecture for future low-mass and low-volume avionics systems; high-density 3D packaged chip-stacks for both volatile and non-volatile mass memory: 400 Mbytes of local DRAM memory, and 128 Mbytes of Flash memory; high-bandwidth Peripheral Component Interface (Per) local-bus with a bridge to VME; high-bandwidth (20 Mbps) fiber-optic serial bus; and other attributes, such as standard support for Design for Testability (DFT). Even though this computer system did not complete on time for delivery to the DS1 project, it was an important development along a technology roadmap towards highly integrated and highly miniaturized avionics systems for deep-space applications. This continued technology development is now being performed by NASA's Deep Space System Development Program (also known as X2000) and within JPL's Center for Integrated Space Microsystems (CISM).
2014-01-01
Background Existing instruments for measuring problematic computer and console gaming and internet use are often lengthy and often based on a pathological perspective. The objective was to develop and present a new and short non-clinical measurement tool for perceived problems related to computer use and gaming among adolescents and to study the association between screen time and perceived problems. Methods Cross-sectional school-survey of 11-, 13-, and 15-year old students in thirteen schools in the City of Aarhus, Denmark, participation rate 89%, n = 2100. The main exposure was time spend on weekdays on computer- and console-gaming and internet use for communication and surfing. The outcome measures were three indexes on perceived problems related to computer and console gaming and internet use. Results The three new indexes showed high face validity and acceptable internal consistency. Most schoolchildren with high screen time did not experience problems related to computer use. Still, there was a strong and graded association between time use and perceived problems related to computer gaming, console gaming (only boys) and internet use, odds ratios ranging from 6.90 to 10.23. Conclusion The three new measures of perceived problems related to computer and console gaming and internet use among adolescents are appropriate, reliable and valid for use in non-clinical surveys about young people’s everyday life and behaviour. These new measures do not assess Internet Gaming Disorder as it is listed in the DSM and therefore has no parity with DSM criteria. We found an increasing risk of perceived problems with increasing time spent with gaming and internet use. Nevertheless, most schoolchildren who spent much time with gaming and internet use did not experience problems. PMID:24731270
LittleQuickWarp: an ultrafast image warping tool.
Qu, Lei; Peng, Hanchuan
2015-02-01
Warping images into a standard coordinate space is critical for many image computing related tasks. However, for multi-dimensional and high-resolution images, an accurate warping operation itself is often very expensive in terms of computer memory and computational time. For high-throughput image analysis studies such as brain mapping projects, it is desirable to have high performance image warping tools that are compatible with common image analysis pipelines. In this article, we present LittleQuickWarp, a swift and memory efficient tool that boosts 3D image warping performance dramatically and at the same time has high warping quality similar to the widely used thin plate spline (TPS) warping. Compared to the TPS, LittleQuickWarp can improve the warping speed 2-5 times and reduce the memory consumption 6-20 times. We have implemented LittleQuickWarp as an Open Source plug-in program on top of the Vaa3D system (http://vaa3d.org). The source code and a brief tutorial can be found in the Vaa3D plugin source code repository. Copyright © 2014 Elsevier Inc. All rights reserved.
Unsteady three-dimensional thermal field prediction in turbine blades using nonlinear BEM
NASA Technical Reports Server (NTRS)
Martin, Thomas J.; Dulikravich, George S.
1993-01-01
A time-and-space accurate and computationally efficient fully three dimensional unsteady temperature field analysis computer code has been developed for truly arbitrary configurations. It uses boundary element method (BEM) formulation based on an unsteady Green's function approach, multi-point Gaussian quadrature spatial integration on each panel, and a highly clustered time-step integration. The code accepts either temperatures or heat fluxes as boundary conditions that can vary in time on a point-by-point basis. Comparisons of the BEM numerical results and known analytical unsteady results for simple shapes demonstrate very high accuracy and reliability of the algorithm. An example of computed three dimensional temperature and heat flux fields in a realistically shaped internally cooled turbine blade is also discussed.
A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations
NASA Technical Reports Server (NTRS)
Dydson, Roger W.; Goodrich, John W.
2000-01-01
Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.
Scheduling algorithms for automatic control systems for technological processes
NASA Astrophysics Data System (ADS)
Chernigovskiy, A. S.; Tsarev, R. Yu; Kapulin, D. V.
2017-01-01
Wide use of automatic process control systems and the usage of high-performance systems containing a number of computers (processors) give opportunities for creation of high-quality and fast production that increases competitiveness of an enterprise. Exact and fast calculations, control computation, and processing of the big data arrays - all of this requires the high level of productivity and, at the same time, minimum time of data handling and result receiving. In order to reach the best time, it is necessary not only to use computing resources optimally, but also to design and develop the software so that time gain will be maximal. For this purpose task (jobs or operations), scheduling techniques for the multi-machine/multiprocessor systems are applied. Some of basic task scheduling methods for the multi-machine process control systems are considered in this paper, their advantages and disadvantages come to light, and also some usage considerations, in case of the software for automatic process control systems developing, are made.
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Urban and Rural Differences in Sedentary Behavior among American and Canadian Youth
Carson, Valerie; Iannotti, Ronald J.; Pickett, William; Janssen, Ian
2011-01-01
We examined relationships between urban-rural status and three screen time behaviors (television, computer, video games), and the potential mediating effect of parent and peer support on these relationships. Findings are based on American (n=8563) and Canadian (n=8990) youth in grades 6–10 from the 2005/06 Health Behaviour in School-Aged Children Survey. Weekly hours of individual screen time behaviors were calculated. Urban-rural status was defined using the Beale coding system. Parent and peer support variables were derived from principal component analysis. In comparison to the referent group (non-metro adjacent), American youth in the most rural areas were more likely to be high television users and less likely to be high computer users. Conversely, Canadian youth in medium and large metropolitan areas were less likely to be high television users and more likely to be high computer users. Parent and peer support did not strongly mediate the relationships between urban-rural status and screen time. These findings suggest that interventions aiming to reduce screen time may be most effective if they consider residential location and the specific screen time behavior. PMID:21565545
Urban and rural differences in sedentary behavior among American and Canadian youth.
Carson, Valerie; Iannotti, Ronald J; Pickett, William; Janssen, Ian
2011-07-01
We examined relationships between urban-rural status and three screen time behaviors (television, computer, video games), and the potential mediating effect of parent and peer support on these relationships. Findings are based on American (n = 8563) and Canadian (n = 8990) youth in grades 6-10 from the 2005/06 Health Behavior in School-Aged Children Survey. Weekly hours of individual screen time behaviors were calculated. Urban-rural status was defined using the Beale coding system. Parent and peer support variables were derived from principal component analysis. In comparison to the referent group (non-metro adjacent), American youth in the most rural areas were more likely to be high television users and less likely to be high computer users. Conversely, Canadian youth in medium and large metropolitan areas were less likely to be high television users and more likely to be high computer users. Parent and peer support did not strongly mediate the relationships between urban-rural status and screen time. These findings suggest that interventions aiming to reduce screen time may be most effective if they consider residential location and the specific screen time behavior. Copyright © 2011 Elsevier Ltd. All rights reserved.
A Stochastic Spiking Neural Network for Virtual Screening.
Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L
2018-04-01
Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
On the generalized VIP time integral methodology for transient thermal problems
NASA Technical Reports Server (NTRS)
Mei, Youping; Chen, Xiaoqin; Tamma, Kumar K.; Sha, Desong
1993-01-01
The paper describes the development and applicability of a generalized VIrtual-Pulse (VIP) time integral method of computation for thermal problems. Unlike past approaches for general heat transfer computations, and with the advent of high speed computing technology and the importance of parallel computations for efficient use of computing environments, a major motivation via the developments described in this paper is the need for developing explicit computational procedures with improved accuracy and stability characteristics. As a consequence, a new and effective VIP methodology is described which inherits these improved characteristics. Numerical illustrative examples are provided to demonstrate the developments and validate the results obtained for thermal problems.
A heterogeneous hierarchical architecture for real-time computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skroch, D.A.; Fornaro, R.J.
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
Collaborative Autonomous Unmanned Aerial - Ground Vehicle Systems for Field Operations
2007-08-31
very limited payload capabilities of small UVs, sacrificing minimal computational power and run time, adhering at the same time to the low cost...configuration has been chosen because of its high computational capabilities, low power consumption, multiple I/O ports, size, low heat emission and cost. This...due to their high power to weight ratio, small packaging, and wide operating temperatures. Power distribution is controlled by the 120 Watt ATX power
Real-time fuzzy inference based robot path planning
NASA Technical Reports Server (NTRS)
Pacini, Peter J.; Teichrow, Jon S.
1990-01-01
This project addresses the problem of adaptive trajectory generation for a robot arm. Conventional trajectory generation involves computing a path in real time to minimize a performance measure such as expended energy. This method can be computationally intensive, and it may yield poor results if the trajectory is weakly constrained. Typically some implicit constraints are known, but cannot be encoded analytically. The alternative approach used here is to formulate domain-specific knowledge, including implicit and ill-defined constraints, in terms of fuzzy rules. These rules utilize linguistic terms to relate input variables to output variables. Since the fuzzy rulebase is determined off-line, only high-level, computationally light processing is required in real time. Potential applications for adaptive trajectory generation include missile guidance and various sophisticated robot control tasks, such as automotive assembly, high speed electrical parts insertion, stepper alignment, and motion control for high speed parcel transfer systems.
The paradigm compiler: Mapping a functional language for the connection machine
NASA Technical Reports Server (NTRS)
Dennis, Jack B.
1989-01-01
The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Data Processing Aspects of MEDLARS
Austin, Charles J.
1964-01-01
The speed and volume requirements of MEDLARS necessitate the use of high-speed data processing equipment, including paper-tape typewriters, a digital computer, and a special device for producing photo-composed output. Input to the system is of three types: variable source data, including citations from the literature and search requests; changes to such master files as the medical subject headings list and the journal record file; and operating instructions such as computer programs and procedures for machine operators. MEDLARS builds two major stores of data on magnetic tape. The Processed Citation File includes bibliographic citations in expanded form for high-quality printing at periodic intervals. The Compressed Citation File is a coded, time-sequential citation store which is used for high-speed searching against demand request input. Major design considerations include converting variable-length, alphanumeric data to mechanical form quickly and accurately; serial searching by the computer within a reasonable period of time; high-speed printing that must be of graphic quality; and efficient maintenance of various complex computer files. PMID:14119287
DATA PROCESSING ASPECTS OF MEDLARS.
AUSTIN, C J
1964-01-01
The speed and volume requirements of MEDLARS necessitate the use of high-speed data processing equipment, including paper-tape typewriters, a digital computer, and a special device for producing photo-composed output. Input to the system is of three types: variable source data, including citations from the literature and search requests; changes to such master files as the medical subject headings list and the journal record file; and operating instructions such as computer programs and procedures for machine operators. MEDLARS builds two major stores of data on magnetic tape. The Processed Citation File includes bibliographic citations in expanded form for high-quality printing at periodic intervals. The Compressed Citation File is a coded, time-sequential citation store which is used for high-speed searching against demand request input. Major design considerations include converting variable-length, alphanumeric data to mechanical form quickly and accurately; serial searching by the computer within a reasonable period of time; high-speed printing that must be of graphic quality; and efficient maintenance of various complex computer files.
Real-time, high frequency QRS electrocardiograph with reduced amplitude zone detection
NASA Technical Reports Server (NTRS)
Schlegel, Todd T. (Inventor); DePalma, Jude L. (Inventor); Moradi, Saeed (Inventor)
2009-01-01
Real time cardiac electrical data are received from a patient, manipulated to determine various useful aspects of the ECG signal, and displayed in real time in a useful form on a computer screen or monitor. The monitor displays the high frequency data from the QRS complex in units of microvolts, juxtaposed with a display of conventional ECG data in units of millivolts or microvolts. The high frequency data are analyzed for their root mean square (RMS) voltage values and the discrete RMS values and related parameters are displayed in real time. The high frequency data from the QRS complex are analyzed with imbedded algorithms to determine the presence or absence of reduced amplitude zones, referred to herein as ''RAZs''. RAZs are displayed as ''go, no-go'' signals on the computer monitor. The RMS and related values of the high frequency components are displayed as time varying signals, and the presence or absence of RAZs may be similarly displayed over time.
Real-time control system for adaptive resonator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Flath, L; An, J; Brase, J
2000-07-24
Sustained operation of high average power solid-state lasers currently requires an adaptive resonator to produce the optimal beam quality. We describe the architecture of a real-time adaptive control system for correcting intra-cavity aberrations in a heat capacity laser. Image data collected from a wavefront sensor are processed and used to control phase with a high-spatial-resolution deformable mirror. Our controller takes advantage of recent developments in low-cost, high-performance processor technology. A desktop-based computational engine and object-oriented software architecture replaces the high-cost rack-mount embedded computers of previous systems.
NASA Technical Reports Server (NTRS)
Sreenivas, Kidambi; Whitfield, David L.
1995-01-01
Two linearized solvers (time and frequency domain) based on a high resolution numerical scheme are presented. The basic approach is to linearize the flux vector by expressing it as a sum of a mean and a perturbation. This allows the governing equations to be maintained in conservation law form. A key difference between the time and frequency domain computations is that the frequency domain computations require only one grid block irrespective of the interblade phase angle for which the flow is being computed. As a result of this and due to the fact that the governing equations for this case are steady, frequency domain computations are substantially faster than the corresponding time domain computations. The linearized equations are used to compute flows in turbomachinery blade rows (cascades) arising due to blade vibrations. Numerical solutions are compared to linear theory (where available) and to numerical solutions of the nonlinear Euler equations.
An Online Gravity Modeling Method Applied for High Precision Free-INS
Wang, Jing; Yang, Gongliu; Li, Jing; Zhou, Xiao
2016-01-01
For real-time solution of inertial navigation system (INS), the high-degree spherical harmonic gravity model (SHM) is not applicable because of its time and space complexity, in which traditional normal gravity model (NGM) has been the dominant technique for gravity compensation. In this paper, a two-dimensional second-order polynomial model is derived from SHM according to the approximate linear characteristic of regional disturbing potential. Firstly, deflections of vertical (DOVs) on dense grids are calculated with SHM in an external computer. And then, the polynomial coefficients are obtained using these DOVs. To achieve global navigation, the coefficients and applicable region of polynomial model are both updated synchronously in above computer. Compared with high-degree SHM, the polynomial model takes less storage and computational time at the expense of minor precision. Meanwhile, the model is more accurate than NGM. Finally, numerical test and INS experiment show that the proposed method outperforms traditional gravity models applied for high precision free-INS. PMID:27669261
An Online Gravity Modeling Method Applied for High Precision Free-INS.
Wang, Jing; Yang, Gongliu; Li, Jing; Zhou, Xiao
2016-09-23
For real-time solution of inertial navigation system (INS), the high-degree spherical harmonic gravity model (SHM) is not applicable because of its time and space complexity, in which traditional normal gravity model (NGM) has been the dominant technique for gravity compensation. In this paper, a two-dimensional second-order polynomial model is derived from SHM according to the approximate linear characteristic of regional disturbing potential. Firstly, deflections of vertical (DOVs) on dense grids are calculated with SHM in an external computer. And then, the polynomial coefficients are obtained using these DOVs. To achieve global navigation, the coefficients and applicable region of polynomial model are both updated synchronously in above computer. Compared with high-degree SHM, the polynomial model takes less storage and computational time at the expense of minor precision. Meanwhile, the model is more accurate than NGM. Finally, numerical test and INS experiment show that the proposed method outperforms traditional gravity models applied for high precision free-INS.
Vibration extraction based on fast NCC algorithm and high-speed camera.
Lei, Xiujun; Jin, Yi; Guo, Jie; Zhu, Chang'an
2015-09-20
In this study, a high-speed camera system is developed to complete the vibration measurement in real time and to overcome the mass introduced by conventional contact measurements. The proposed system consists of a notebook computer and a high-speed camera which can capture the images as many as 1000 frames per second. In order to process the captured images in the computer, the normalized cross-correlation (NCC) template tracking algorithm with subpixel accuracy is introduced. Additionally, a modified local search algorithm based on the NCC is proposed to reduce the computation time and to increase efficiency significantly. The modified algorithm can rapidly accomplish one displacement extraction 10 times faster than the traditional template matching without installing any target panel onto the structures. Two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the system performance in practice. The results demonstrated the high accuracy and efficiency of the camera system in extracting vibrating signals.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Three Dimensional Aerodynamic Analysis of a High-Lift Transport Configuration
NASA Technical Reports Server (NTRS)
Dodbele, Simha S.
1993-01-01
Two computational methods, a surface panel method and an Euler method employing unstructured grid methodology, were used to analyze a subsonic transport aircraft in cruise and high-lift conditions. The computational results were compared with two separate sets of flight data obtained for the cruise and high-lift configurations. For the cruise configuration, the surface pressures obtained by the panel method and the Euler method agreed fairly well with results from flight test. However, for the high-lift configuration considerable differences were observed when the computational surface pressures were compared with the results from high-lift flight test. On the lower surface of all the elements with the exception of the slat, both the panel and Euler methods predicted pressures which were in good agreement with flight data. On the upper surface of all the elements the panel method predicted slightly higher suction compared to the Euler method. On the upper surface of the slat, pressure coefficients obtained by both the Euler and panel methods did not agree with the results of the flight tests. A sensitivity study of the upward deflection of the slat from the 40 deg. flap setting suggested that the differences in the slat deflection between the computational model and the flight configuration could be one of the sources of this discrepancy. The computation time for the implicit version of the Euler code was about 1/3 the time taken by the explicit version though the implicit code required 3 times the memory taken by the explicit version.
Computer Access and Flowcharting as Variables in Learning Computer Programming.
ERIC Educational Resources Information Center
Ross, Steven M.; McCormick, Deborah
Manipulation of flowcharting was crossed with in-class computer access to examine flowcharting effects in the traditional lecture/laboratory setting and in a classroom setting where online time was replaced with manual simulation. Seventy-two high school students (24 male and 48 female) enrolled in a computer literacy course served as subjects.…
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.
Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon
2012-01-01
We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Scalable Multiprocessor for High-Speed Computing in Space
NASA Technical Reports Server (NTRS)
Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard
2004-01-01
A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.
1997-04-01
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
O'Donnell, Michael
2015-01-01
State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf
Neighborhood disorder and screen time among 10-16 year old Canadian youth: A cross-sectional study
2012-01-01
Background Screen time activities (e.g., television, computers, video games) have been linked to several negative health outcomes among young people. In order to develop evidence-based interventions to reduce screen time, the factors that influence the behavior need to be better understood. High neighborhood disorder, which may encourage young people to stay indoors where screen time activities are readily available, is one potential factor to consider. Methods Results are based on 15,917 youth in grades 6-10 (aged 10-16 years old) who participated in the Canadian 2009/10 Health Behaviour in School-aged Children Survey (HBSC). Total hours per week of television, video games, and computer use were reported by the participating students in the HBSC student questionnaire. Ten items of neighborhood disorder including safety, neighbors taking advantage, drugs/drinking in public, ethnic tensions, gangs, crime, conditions of buildings/grounds, abandoned buildings, litter, and graffiti were measured using the HBSC student questionnaire, the HBSC administrator questionnaire, and Geographic Information Systems. Based upon these 10 items, social and physical neighborhood disorder variables were derived using principal component analysis. Multivariate multilevel logistic regression analyses were used to examine the relationship between social and physical neighborhood disorder and individual screen time variables. Results High (top quartile) social neighborhood disorder was associated with approximately 35-45% increased risk of high (top quartile) television, computer, and video game use. Physical neighborhood disorder was not associated with screen time activities after adjusting for social neighborhood disorder. However, high social and physical neighborhood disorder combined was associated with approximately 40-60% increased likelihood of high television, computer, and video game use. Conclusion High neighborhood disorder is one environmental factor that may be important to consider for future public health interventions and strategies aiming to reduce screen time among youth. PMID:22651908
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Sitting Time in Adults 65 Years and Over: Behavior, Knowledge, and Intentions to Change.
Alley, Stephanie; van Uffelen, Jannique G Z; Duncan, Mitch J; De Cocker, Katrien; Schoeppe, Stephanie; Rebar, Amanda L; Vandelanotte, Corneel
2018-04-01
This study examined sitting time, knowledge, and intentions to change sitting time in older adults. An online survey was completed by 494 Australians aged 65+. Average daily sitting was high (9.0 hr). Daily sitting time was the highest during TV (3.3 hr), computer (2.1 hr), and leisure (1.7 hr). A regression analysis demonstrated that women were more knowledgeable about the health risks of sitting compared to men. The percentage of older adults intending to sit less were the highest for TV (24%), leisure (24%), and computer (19%) sitting time. Regression analyses demonstrated that intentions varied by gender (for TV sitting), education (leisure and work sitting), body mass index (computer, leisure, and transport sitting), and physical activity (TV, computer, and leisure sitting). Interventions should target older adults' TV, computer, and leisure time sitting, with a focus on intentions in older males and older adults with low education, those who are active, and those with a normal weight.
On the reliability of computed chaotic solutions of non-linear differential equations
NASA Astrophysics Data System (ADS)
Liao, Shijun
2009-08-01
A new concept, namely the critical predictable time Tc, is introduced to give a more precise description of computed chaotic solutions of non-linear differential equations: it is suggested that computed chaotic solutions are unreliable and doubtable when t > Tc. This provides us a strategy to detect reliable solution from a given computed result. In this way, the computational phenomena, such as computational chaos (CC), computational periodicity (CP) and computational prediction uncertainty, which are mainly based on long-term properties of computed time-series, can be completely avoided. Using this concept, the famous conclusion `accurate long-term prediction of chaos is impossible' should be replaced by a more precise conclusion that `accurate prediction of chaos beyond the critical predictable time Tc is impossible'. So, this concept also provides us a timescale to determine whether or not a particular time is long enough for a given non-linear dynamic system. Besides, the influence of data inaccuracy and various numerical schemes on the critical predictable time is investigated in details by using symbolic computation software as a tool. A reliable chaotic solution of Lorenz equation in a rather large interval 0 <= t < 1200 non-dimensional Lorenz time units is obtained for the first time. It is found that the precision of the initial condition and the computed data at each time step, which is mathematically necessary to get such a reliable chaotic solution in such a long time, is so high that it is physically impossible due to the Heisenberg uncertainty principle in quantum physics. This, however, provides us a so-called `precision paradox of chaos', which suggests that the prediction uncertainty of chaos is physically unavoidable, and that even the macroscopical phenomena might be essentially stochastic and thus could be described by probability more economically.
Frozen Gaussian approximation for 3D seismic tomography
NASA Astrophysics Data System (ADS)
Chai, Lihui; Tong, Ping; Yang, Xu
2018-05-01
Three-dimensional (3D) wave-equation-based seismic tomography is computationally challenging in large scales and high-frequency regime. In this paper, we apply the frozen Gaussian approximation (FGA) method to compute 3D sensitivity kernels and seismic tomography of high-frequency. Rather than standard ray theory used in seismic inversion (e.g. Kirchhoff migration and Gaussian beam migration), FGA is used to compute the 3D high-frequency sensitivity kernels for travel-time or full waveform inversions. Specifically, we reformulate the equations of the forward and adjoint wavefields for the purpose of convenience to apply FGA, and with this reformulation, one can efficiently compute the Green’s functions whose convolutions with source time function produce wavefields needed for the construction of 3D kernels. Moreover, a fast summation method is proposed based on local fast Fourier transform which greatly improves the speed of reconstruction as the last step of FGA algorithm. We apply FGA to both the travel-time adjoint tomography and full waveform inversion (FWI) on synthetic crosswell seismic data with dominant frequencies as high as those of real crosswell data, and confirm again that FWI requires a more sophisticated initial velocity model for the convergence than travel-time adjoint tomography. We also numerically test the accuracy of applying FGA to local earthquake tomography. This study paves the way to directly apply wave-equation-based seismic tomography methods into real data around their dominant frequencies.
Adaptive multi-time-domain subcycling for crystal plasticity FE modeling of discrete twin evolution
NASA Astrophysics Data System (ADS)
Ghosh, Somnath; Cheng, Jiahao
2018-02-01
Crystal plasticity finite element (CPFE) models that accounts for discrete micro-twin nucleation-propagation have been recently developed for studying complex deformation behavior of hexagonal close-packed (HCP) materials (Cheng and Ghosh in Int J Plast 67:148-170, 2015, J Mech Phys Solids 99:512-538, 2016). A major difficulty with conducting high fidelity, image-based CPFE simulations of polycrystalline microstructures with explicit twin formation is the prohibitively high demands on computing time. High strain localization within fast propagating twin bands requires very fine simulation time steps and leads to enormous computational cost. To mitigate this shortcoming and improve the simulation efficiency, this paper proposes a multi-time-domain subcycling algorithm. It is based on adaptive partitioning of the evolving computational domain into twinned and untwinned domains. Based on the local deformation-rate, the algorithm accelerates simulations by adopting different time steps for each sub-domain. The sub-domains are coupled back after coarse time increments using a predictor-corrector algorithm at the interface. The subcycling-augmented CPFEM is validated with a comprehensive set of numerical tests. Significant speed-up is observed with this novel algorithm without any loss of accuracy that is advantageous for predicting twinning in polycrystalline microstructures.
Execution environment for intelligent real-time control systems
NASA Technical Reports Server (NTRS)
Sztipanovits, Janos
1987-01-01
Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.
Dataflow computing approach in high-speed digital simulation
NASA Technical Reports Server (NTRS)
Ercegovac, M. D.; Karplus, W. J.
1984-01-01
New computational tools and methodologies for the digital simulation of continuous systems were explored. Programmability, and cost effective performance in multiprocessor organizations for real time simulation was investigated. Approach is based on functional style languages and data flow computing principles, which allow for the natural representation of parallelism in algorithms and provides a suitable basis for the design of cost effective high performance distributed systems. The objectives of this research are to: (1) perform comparative evaluation of several existing data flow languages and develop an experimental data flow language suitable for real time simulation using multiprocessor systems; (2) investigate the main issues that arise in the architecture and organization of data flow multiprocessors for real time simulation; and (3) develop and apply performance evaluation models in typical applications.
Appropriate Use Policy | High-Performance Computing | NREL
users of the National Renewable Energy Laboratory (NREL) High Performance Computing (HPC) resources government agency, National Laboratory, University, or private entity, the intellectual property terms (if issued a multifactor token which may be a physical token or a virtual token used with one-time password
NASA Astrophysics Data System (ADS)
Neverov, V. V.; Kozhukhov, Y. V.; Yablokov, A. M.; Lebedev, A. A.
2017-08-01
Nowadays the optimization using computational fluid dynamics (CFD) plays an important role in the design process of turbomachines. However, for the successful and productive optimization it is necessary to define a simulation model correctly and rationally. The article deals with the choice of a grid and computational domain parameters for optimization of centrifugal compressor impellers using computational fluid dynamics. Searching and applying optimal parameters of the grid model, the computational domain and solver settings allows engineers to carry out a high-accuracy modelling and to use computational capability effectively. The presented research was conducted using Numeca Fine/Turbo package with Spalart-Allmaras and Shear Stress Transport turbulence models. Two radial impellers was investigated: the high-pressure at ψT=0.71 and the low-pressure at ψT=0.43. The following parameters of the computational model were considered: the location of inlet and outlet boundaries, type of mesh topology, size of mesh and mesh parameter y+. Results of the investigation demonstrate that the choice of optimal parameters leads to the significant reduction of the computational time. Optimal parameters in comparison with non-optimal but visually similar parameters can reduce the calculation time up to 4 times. Besides, it is established that some parameters have a major impact on the result of modelling.
The quantum computer game: citizen science
NASA Astrophysics Data System (ADS)
Damgaard, Sidse; Mølmer, Klaus; Sherson, Jacob
2013-05-01
Progress in the field of quantum computation is hampered by daunting technical challenges. Here we present an alternative approach to solving these by enlisting the aid of computer players around the world. We have previously examined a quantum computation architecture involving ultracold atoms in optical lattices and strongly focused tweezers of light. In The Quantum Computer Game (see http://www.scienceathome.org/), we have encapsulated the time-dependent Schrödinger equation for the problem in a graphical user interface allowing for easy user input. Players can then search the parameter space with real-time graphical feedback in a game context with a global high-score that rewards short gate times and robustness to experimental errors. The game which is still in a demo version has so far been tried by several hundred players. Extensions of the approach to other models such as Gross-Pitaevskii and Bose-Hubbard are currently under development. The game has also been incorporated into science education at high-school and university level as an alternative method for teaching quantum mechanics. Initial quantitative evaluation results are very positive. AU Ideas Center for Community Driven Research, CODER.
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
2010-01-01
Background Sedentary behavior is considered a separate construct from physical activity and engaging in sedentary behaviors results in health effects independent of physical activity levels. A major source of sedentary behavior in children is time spent viewing TV or movies, playing video games, and using computers. To date no study has examined the impact of neighborhood socioeconomic status (SES) on pre-school children's screen time behavior. Methods Proxy reports of weekday and weekend screen time (TV/movies, video games, and computer use) were completed by 1633 parents on their 4-5 year-old children in Edmonton, Alberta between November, 2005 and August, 2007. Postal codes were used to classified neighborhoods into low, medium or high SES. Multiple linear and logistic regression models were conducted to examine relationships between screen time and neighborhood SES. Results Girls living in low SES neighborhoods engaged in significantly more weekly overall screen time and TV/movie minutes compared to girls living in high SES neighborhoods. The same relationship was not observed in boys. Children living in low SES neighborhoods were significantly more likely to be video game users and less likely to be computer users compared to children living in high SES neighborhoods. Also, children living in medium SES neighborhoods were significantly less likely to be computer users compared to children living in high SES neighborhoods. Conclusions Some consideration should be given to providing alternative activity opportunities for children, especially girls who live in lower SES neighborhoods. Also, future research should continue to investigate the independent effects of neighborhood SES on screen time as well as the potential mediating variables for this relationship. PMID:20573262
Li, Xiangrui; Lu, Zhong-Lin
2012-02-29
Display systems based on conventional computer graphics cards are capable of generating images with 8-bit gray level resolution. However, most experiments in vision research require displays with more than 12 bits of luminance resolution. Several solutions are available. Bit++ (1) and DataPixx (2) use the Digital Visual Interface (DVI) output from graphics cards and high resolution (14 or 16-bit) digital-to-analog converters to drive analog display devices. The VideoSwitcher (3) described here combines analog video signals from the red and blue channels of graphics cards with different weights using a passive resister network (4) and an active circuit to deliver identical video signals to the three channels of color monitors. The method provides an inexpensive way to enable high-resolution monochromatic displays using conventional graphics cards and analog monitors. It can also provide trigger signals that can be used to mark stimulus onsets, making it easy to synchronize visual displays with physiological recordings or response time measurements. Although computer keyboards and mice are frequently used in measuring response times (RT), the accuracy of these measurements is quite low. The RTbox is a specialized hardware and software solution for accurate RT measurements. Connected to the host computer through a USB connection, the driver of the RTbox is compatible with all conventional operating systems. It uses a microprocessor and high-resolution clock to record the identities and timing of button events, which are buffered until the host computer retrieves them. The recorded button events are not affected by potential timing uncertainties or biases associated with data transmission and processing in the host computer. The asynchronous storage greatly simplifies the design of user programs. Several methods are available to synchronize the clocks of the RTbox and the host computer. The RTbox can also receive external triggers and be used to measure RT with respect to external events. Both VideoSwitcher and RTbox are available for users to purchase. The relevant information and many demonstration programs can be found at http://lobes.usc.edu/.
Hardware accelerated high performance neutron transport computation based on AGENT methodology
NASA Astrophysics Data System (ADS)
Xiao, Shanjie
The spatial heterogeneity of the next generation Gen-IV nuclear reactor core designs brings challenges to the neutron transport analysis. The Arbitrary Geometry Neutron Transport (AGENT) AGENT code is a three-dimensional neutron transport analysis code being developed at the Laboratory for Neutronics and Geometry Computation (NEGE) at Purdue University. It can accurately describe the spatial heterogeneity in a hierarchical structure through the R-function solid modeler. The previous version of AGENT coupled the 2D transport MOC solver and the 1D diffusion NEM solver to solve the three dimensional Boltzmann transport equation. In this research, the 2D/1D coupling methodology was expanded to couple two transport solvers, the radial 2D MOC solver and the axial 1D MOC solver, for better accuracy. The expansion was benchmarked with the widely applied C5G7 benchmark models and two fast breeder reactor models, and showed good agreement with the reference Monte Carlo results. In practice, the accurate neutron transport analysis for a full reactor core is still time-consuming and thus limits its application. Therefore, another content of my research is focused on designing a specific hardware based on the reconfigurable computing technique in order to accelerate AGENT computations. It is the first time that the application of this type is used to the reactor physics and neutron transport for reactor design. The most time consuming part of the AGENT algorithm was identified. Moreover, the architecture of the AGENT acceleration system was designed based on the analysis. Through the parallel computation on the specially designed, highly efficient architecture, the acceleration design on FPGA acquires high performance at the much lower working frequency than CPUs. The whole design simulations show that the acceleration design would be able to speedup large scale AGENT computations about 20 times. The high performance AGENT acceleration system will drastically shortening the computation time for 3D full-core neutron transport analysis, making the AGENT methodology unique and advantageous, and thus supplies the possibility to extend the application range of neutron transport analysis in either industry engineering or academic research.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
NASA's Participation in the National Computational Grid
NASA Technical Reports Server (NTRS)
Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)
1998-01-01
Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.
NASA Astrophysics Data System (ADS)
Tripathi, Vijay S.; Yeh, G. T.
1993-06-01
Sophisticated and highly computation-intensive models of transport of reactive contaminants in groundwater have been developed in recent years. Application of such models to real-world contaminant transport problems, e.g., simulation of groundwater transport of 10-15 chemically reactive elements (e.g., toxic metals) and relevant complexes and minerals in two and three dimensions over a distance of several hundred meters, requires high-performance computers including supercomputers. Although not widely recognized as such, the computational complexity and demand of these models compare with well-known computation-intensive applications including weather forecasting and quantum chemical calculations. A survey of the performance of a variety of available hardware, as measured by the run times for a reactive transport model HYDROGEOCHEM, showed that while supercomputers provide the fastest execution times for such problems, relatively low-cost reduced instruction set computer (RISC) based scalar computers provide the best performance-to-price ratio. Because supercomputers like the Cray X-MP are inherently multiuser resources, often the RISC computers also provide much better turnaround times. Furthermore, RISC-based workstations provide the best platforms for "visualization" of groundwater flow and contaminant plumes. The most notable result, however, is that current workstations costing less than $10,000 provide performance within a factor of 5 of a Cray X-MP.
Computational Methods for Stability and Control (COMSAC): The Time Has Come
NASA Technical Reports Server (NTRS)
Hall, Robert M.; Biedron, Robert T.; Ball, Douglas N.; Bogue, David R.; Chung, James; Green, Bradford E.; Grismer, Matthew J.; Brooks, Gregory P.; Chambers, Joseph R.
2005-01-01
Powerful computational fluid dynamics (CFD) tools have emerged that appear to offer significant benefits as an adjunct to the experimental methods used by the stability and control community to predict aerodynamic parameters. The decreasing costs for and increasing availability of computing hours are making these applications increasingly viable as time goes on and the cost of computing continues to drop. This paper summarizes the efforts of four organizations to utilize high-end computational fluid dynamics (CFD) tools to address the challenges of the stability and control arena. General motivation and the backdrop for these efforts will be summarized as well as examples of current applications.
High-Fidelity Simulations of Electromagnetic Propagation and RF Communication Systems
2017-05-01
addition to high -fidelity RF propagation modeling, lower-fidelity mod- els, which are less computationally burdensome, are available via a C++ API...expensive to perform, requiring roughly one hour of computer time with 36 available cores and ray tracing per- formed by a single high -end GPU...ER D C TR -1 7- 2 Military Engineering Applied Research High -Fidelity Simulations of Electromagnetic Propagation and RF Communication
Framework for architecture-independent run-time reconfigurable applications
NASA Astrophysics Data System (ADS)
Lehn, David I.; Hudson, Rhett D.; Athanas, Peter M.
2000-10-01
Configurable Computing Machines (CCMs) have emerged as a technology with the computational benefits of custom ASICs as well as the flexibility and reconfigurability of general-purpose microprocessors. Significant effort from the research community has focused on techniques to move this reconfigurability from a rapid application development tool to a run-time tool. This requires the ability to change the hardware design while the application is executing and is known as Run-Time Reconfiguration (RTR). Widespread acceptance of run-time reconfigurable custom computing depends upon the existence of high-level automated design tools. Such tools must reduce the designers effort to port applications between different platforms as the architecture, hardware, and software evolves. A Java implementation of a high-level application framework, called Janus, is presented here. In this environment, developers create Java classes that describe the structural behavior of an application. The framework allows hardware and software modules to be freely mixed and interchanged. A compilation phase of the development process analyzes the structure of the application and adapts it to the target platform. Janus is capable of structuring the run-time behavior of an application to take advantage of the memory and computational resources available.
GPU-based High-Performance Computing for Radiation Therapy
Jia, Xun; Ziegenhein, Peter; Jiang, Steve B.
2014-01-01
Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. Graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment and maintenance. Over the past a few years, GPU-based high-performance computing in radiotherapy has experienced rapid developments. A tremendous amount of studies have been conducted, in which large acceleration factors compared with the conventional CPU platform have been observed. In this article, we will first give a brief introduction to the GPU hardware structure and programming model. We will then review the current applications of GPU in major imaging-related and therapy-related problems encountered in radiotherapy. A comparison of GPU with other platforms will also be presented. PMID:24486639
DOE Office of Scientific and Technical Information (OSTI.GOV)
Albright, Brian James; Yin, Lin; Stark, David James
This proposal sought of order 1M core-hours of Institutional Computing time intended to enable computing by a new LANL Postdoc (David Stark) working under LDRD ER project 20160472ER (PI: Lin Yin) on laser-ion acceleration. The project was “off-cycle,” initiating in June of 2016 with a postdoc hire.
FLAME: A platform for high performance computing of complex systems, applied for three case studies
Kiran, Mariam; Bicak, Mesude; Maleki-Dizaji, Saeedeh; ...
2011-01-01
FLAME allows complex models to be automatically parallelised on High Performance Computing (HPC) grids enabling large number of agents to be simulated over short periods of time. Modellers are hindered by complexities of porting models on parallel platforms and time taken to run large simulations on a single machine, which FLAME overcomes. Three case studies from different disciplines were modelled using FLAME, and are presented along with their performance results on a grid.
Midekisa, Alemayehu; Holl, Felix; Savory, David J; Andrade-Pacheco, Ricardo; Gething, Peter W; Bennett, Adam; Sturrock, Hugh J W
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.
Holl, Felix; Savory, David J.; Andrade-Pacheco, Ricardo; Gething, Peter W.; Bennett, Adam; Sturrock, Hugh J. W.
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth’s land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources. PMID:28953943
High performance GPU processing for inversion using uniform grid searches
NASA Astrophysics Data System (ADS)
Venetis, Ioannis E.; Saltogianni, Vasso; Stiros, Stathis; Gallopoulos, Efstratios
2017-04-01
Many geophysical problems are described by systems of redundant, highly non-linear systems of ordinary equations with constant terms deriving from measurements and hence representing stochastic variables. Solution (inversion) of such problems is based on numerical, optimization methods, based on Monte Carlo sampling or on exhaustive searches in cases of two or even three "free" unknown variables. Recently the TOPological INVersion (TOPINV) algorithm, a grid search-based technique in the Rn space, has been proposed. TOPINV is not based on the minimization of a certain cost function and involves only forward computations, hence avoiding computational errors. The basic concept is to transform observation equations into inequalities on the basis of an optimization parameter k and of their standard errors, and through repeated "scans" of n-dimensional search grids for decreasing values of k to identify the optimal clusters of gridpoints which satisfy observation inequalities and by definition contain the "true" solution. Stochastic optimal solutions and their variance-covariance matrices are then computed as first and second statistical moments. Such exhaustive uniform searches produce an excessive computational load and are extremely time consuming for common computers based on a CPU. An alternative is to use a computing platform based on a GPU, which nowadays is affordable to the research community, which provides a much higher computing performance. Using the CUDA programming language to implement TOPINV allows the investigation of the attained speedup in execution time on such a high performance platform. Based on synthetic data we compared the execution time required for two typical geophysical problems, modeling magma sources and seismic faults, described with up to 18 unknown variables, on both CPU/FORTRAN and GPU/CUDA platforms. The same problems for several different sizes of search grids (up to 1012 gridpoints) and numbers of unknown variables were solved on both platforms, and execution time as a function of the grid dimension for each problem was recorded. Results indicate an average speedup in calculations by a factor of 100 on the GPU platform; for example problems with 1012 grid-points require less than two hours instead of several days on conventional desktop computers. Such a speedup encourages the application of TOPINV on high performance platforms, as a GPU, in cases where nearly real time decisions are necessary, for example finite fault modeling to identify possible tsunami sources.
Propeller speed and phase sensor
NASA Technical Reports Server (NTRS)
Collopy, Paul D. (Inventor); Bennett, George W. (Inventor)
1992-01-01
A speed and phase sensor counterrotates aircraft propellers. A toothed wheel is attached to each propeller, and the teeth trigger a sensor as they pass, producing a sequence of signals. From the sequence of signals, rotational speed of each propeller is computer based on time intervals between successive signals. The speed can be computed several times during one revolution, thus giving speed information which is highly up-to-date. Given that spacing between teeth may not be uniform, the signals produced may be nonuniform in time. Error coefficients are derived to correct for nonuniformities in the resulting signals, thus allowing accurate speed to be computed despite the spacing nonuniformities. Phase can be viewed as the relative rotational position of one propeller with respect to the other, but measured at a fixed time. Phase is computed from the signals.
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
NASA Astrophysics Data System (ADS)
Zhang, Ling; Nan, Zhuotong; Liang, Xu; Xu, Yi; Hernández, Felipe; Li, Lianxia
2018-03-01
Although process-based distributed hydrological models (PDHMs) are evolving rapidly over the last few decades, their extensive applications are still challenged by the computational expenses. This study attempted, for the first time, to apply the numerically efficient MacCormack algorithm to overland flow routing in a representative high-spatial resolution PDHM, i.e., the distributed hydrology-soil-vegetation model (DHSVM), in order to improve its computational efficiency. The analytical verification indicates that both the semi and full versions of the MacCormack schemes exhibit robust numerical stability and are more computationally efficient than the conventional explicit linear scheme. The full-version outperforms the semi-version in terms of simulation accuracy when a same time step is adopted. The semi-MacCormack scheme was implemented into DHSVM (version 3.1.2) to solve the kinematic wave equations for overland flow routing. The performance and practicality of the enhanced DHSVM-MacCormack model was assessed by performing two groups of modeling experiments in the Mercer Creek watershed, a small urban catchment near Bellevue, Washington. The experiments show that DHSVM-MacCormack can considerably improve the computational efficiency without compromising the simulation accuracy of the original DHSVM model. More specifically, with the same computational environment and model settings, the computational time required by DHSVM-MacCormack can be reduced to several dozen minutes for a simulation period of three months (in contrast with one day and a half by the original DHSVM model) without noticeable sacrifice of the accuracy. The MacCormack scheme proves to be applicable to overland flow routing in DHSVM, which implies that it can be coupled into other PHDMs for watershed routing to either significantly improve their computational efficiency or to make the kinematic wave routing for high resolution modeling computational feasible.
Real-time computer treatment of THz passive device images with the high image quality
NASA Astrophysics Data System (ADS)
Trofimov, Vyacheslav A.; Trofimov, Vladislav V.
2012-06-01
We demonstrate real-time computer code improving significantly the quality of images captured by the passive THz imaging system. The code is not only designed for a THz passive device: it can be applied to any kind of such devices and active THz imaging systems as well. We applied our code for computer processing of images captured by four passive THz imaging devices manufactured by different companies. It should be stressed that computer processing of images produced by different companies requires using the different spatial filters usually. The performance of current version of the computer code is greater than one image per second for a THz image having more than 5000 pixels and 24 bit number representation. Processing of THz single image produces about 20 images simultaneously corresponding to various spatial filters. The computer code allows increasing the number of pixels for processed images without noticeable reduction of image quality. The performance of the computer code can be increased many times using parallel algorithms for processing the image. We develop original spatial filters which allow one to see objects with sizes less than 2 cm. The imagery is produced by passive THz imaging devices which captured the images of objects hidden under opaque clothes. For images with high noise we develop an approach which results in suppression of the noise after using the computer processing and we obtain the good quality image. With the aim of illustrating the efficiency of the developed approach we demonstrate the detection of the liquid explosive, ordinary explosive, knife, pistol, metal plate, CD, ceramics, chocolate and other objects hidden under opaque clothes. The results demonstrate the high efficiency of our approach for the detection of hidden objects and they are a very promising solution for the security problem.
User Account Passwords | High-Performance Computing | NREL
Account Passwords User Account Passwords For NREL's high-performance computing (HPC) systems, learn about user account password requirements and how to set up, log in, and change passwords. Password Logging In the First Time After you request an HPC user account, you'll receive a temporary password. Set
NRC Class 1E Digital Computer System Guidelines
1993-05-01
then be "proved" that the vessel cannot be at high temperature state and norma ! t emperature state at the same time. The question whether high, normal...3 of Dependability of critical computer systems. Elsever Applied Science, 1988. [18] J. W. Duran and S. C. Ntafos, "A report on random testing," in
High-Order Implicit-Explicit Multi-Block Time-stepping Method for Hyperbolic PDEs
NASA Technical Reports Server (NTRS)
Nielsen, Tanner B.; Carpenter, Mark H.; Fisher, Travis C.; Frankel, Steven H.
2014-01-01
This work seeks to explore and improve the current time-stepping schemes used in computational fluid dynamics (CFD) in order to reduce overall computational time. A high-order scheme has been developed using a combination of implicit and explicit (IMEX) time-stepping Runge-Kutta (RK) schemes which increases numerical stability with respect to the time step size, resulting in decreased computational time. The IMEX scheme alone does not yield the desired increase in numerical stability, but when used in conjunction with an overlapping partitioned (multi-block) domain significant increase in stability is observed. To show this, the Overlapping-Partition IMEX (OP IMEX) scheme is applied to both one-dimensional (1D) and two-dimensional (2D) problems, the nonlinear viscous Burger's equation and 2D advection equation, respectively. The method uses two different summation by parts (SBP) derivative approximations, second-order and fourth-order accurate. The Dirichlet boundary conditions are imposed using the Simultaneous Approximation Term (SAT) penalty method. The 6-stage additive Runge-Kutta IMEX time integration schemes are fourth-order accurate in time. An increase in numerical stability 65 times greater than the fully explicit scheme is demonstrated to be achievable with the OP IMEX method applied to 1D Burger's equation. Results from the 2D, purely convective, advection equation show stability increases on the order of 10 times the explicit scheme using the OP IMEX method. Also, the domain partitioning method in this work shows potential for breaking the computational domain into manageable sizes such that implicit solutions for full three-dimensional CFD simulations can be computed using direct solving methods rather than the standard iterative methods currently used.
Using the Computer in Evolution Studies
ERIC Educational Resources Information Center
Mariner, James L.
1973-01-01
Describes a high school biology exercise in which a computer greatly reduces time spent on calculations. Genetic equilibrium demonstrated by the Hardy-Weinberg principle and the subsequent effects of violating any of its premises are more readily understood when frequencies of alleles through many generations are calculated by the computer. (JR)
Computer Science 205. Interim Guide, 1983.
ERIC Educational Resources Information Center
Manitoba Dept. of Education, Winnipeg.
This guide to a 4-unit, required high school computer science course emphasizes problem solving and computer programming and is designed for use with a variety of hardware configurations and programming languages. An overview covers the program rationale, goals and objectives, program design and description, program implementation, time allotment,…
Large-Amplitude, High-Rate Roll Oscillations of a 65 deg Delta Wing at High Incidence
NASA Technical Reports Server (NTRS)
Chaderjian, Neal M.; Schiff, Lewis B.
2000-01-01
The IAR/WL 65 deg delta wing experimental results provide both detail pressure measurements and a wide range of flow conditions covering from simple attached flow, through fully developed vortex and vortex burst flow, up to fully-stalled flow at very high incidence. Thus, the Computational Unsteady Aerodynamics researchers can use it at different level of validating the corresponding code. In this section a range of CFD results are provided for the 65 deg delta wing at selected flow conditions. The time-dependent, three-dimensional, Reynolds-averaged, Navier-Stokes (RANS) equations are used to numerically simulate the unsteady vertical flow. Two sting angles and two large- amplitude, high-rate, forced-roll motions and a damped free-to-roll motion are presented. The free-to-roll motion is computed by coupling the time-dependent RANS equations to the flight dynamic equation of motion. The computed results are compared with experimental pressures, forces, moments and roll angle time history. In addition, surface and off-surface flow particle streaks are also presented.
NASA Technical Reports Server (NTRS)
Tam, Christopher K. W.; Aganin, Alexei
2000-01-01
The transonic nozzle transmission problem and the open rotor noise radiation problem are solved computationally. Both are multiple length scales problems. For efficient and accurate numerical simulation, the multiple-size-mesh multiple-time-step Dispersion-Relation-Preserving scheme is used to calculate the time periodic solution. To ensure an accurate solution, high quality numerical boundary conditions are also needed. For the nozzle problem, a set of nonhomogeneous, outflow boundary conditions are required. The nonhomogeneous boundary conditions not only generate the incoming sound waves but also, at the same time, allow the reflected acoustic waves and entropy waves, if present, to exit the computation domain without reflection. For the open rotor problem, there is an apparent singularity at the axis of rotation. An analytic extension approach is developed to provide a high quality axis boundary treatment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Etingov, Pavel V.; Ren, Huiying
This paper describes a probabilistic look-ahead contingency analysis application that incorporates smart sampling and high-performance computing (HPC) techniques. Smart sampling techniques are implemented to effectively represent the structure and statistical characteristics of uncertainty introduced by different sources in the power system. They can significantly reduce the data set size required for multiple look-ahead contingency analyses, and therefore reduce the time required to compute them. High-performance-computing (HPC) techniques are used to further reduce computational time. These two techniques enable a predictive capability that forecasts the impact of various uncertainties on potential transmission limit violations. The developed package has been tested withmore » real world data from the Bonneville Power Administration. Case study results are presented to demonstrate the performance of the applications developed.« less
Computational Methods for HSCT-Inlet Controls/CFD Interdisciplinary Research
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Melcher, Kevin J.; Chicatelli, Amy K.; Hartley, Tom T.; Chung, Joongkee
1994-01-01
A program aimed at facilitating the use of computational fluid dynamics (CFD) simulations by the controls discipline is presented. The objective is to reduce the development time and cost for propulsion system controls by using CFD simulations to obtain high-fidelity system models for control design and as numerical test beds for control system testing and validation. An interdisciplinary team has been formed to develop analytical and computational tools in three discipline areas: controls, CFD, and computational technology. The controls effort has focused on specifying requirements for an interface between the controls specialist and CFD simulations and a new method for extracting linear, reduced-order control models from CFD simulations. Existing CFD codes are being modified to permit time accurate execution and provide realistic boundary conditions for controls studies. Parallel processing and distributed computing techniques, along with existing system integration software, are being used to reduce CFD execution times and to support the development of an integrated analysis/design system. This paper describes: the initial application for the technology being developed, the high speed civil transport (HSCT) inlet control problem; activities being pursued in each discipline area; and a prototype analysis/design system in place for interactive operation and visualization of a time-accurate HSCT-inlet simulation.
The viability of ADVANTG deterministic method for synthetic radiography generation
NASA Astrophysics Data System (ADS)
Bingham, Andrew; Lee, Hyoung K.
2018-07-01
Fast simulation techniques to generate synthetic radiographic images of high resolution are helpful when new radiation imaging systems are designed. However, the standard stochastic approach requires lengthy run time with poorer statistics at higher resolution. The investigation of the viability of a deterministic approach to synthetic radiography image generation was explored. The aim was to analyze a computational time decrease over the stochastic method. ADVANTG was compared to MCNP in multiple scenarios including a small radiography system prototype, to simulate high resolution radiography images. By using ADVANTG deterministic code to simulate radiography images the computational time was found to decrease 10 to 13 times compared to the MCNP stochastic approach while retaining image quality.
ERIC Educational Resources Information Center
Heller, Barbara R.; Chitayat, Linda
This report covers three time periods during which students in five New York City high schools had use of a Computer Assisted Guidance (CAG) system. The basic objectives of the CAG project were to demonstrate the feasibility of using an automated system to provide high school students with factual and current information on colleges and careers,…
ERIC Educational Resources Information Center
Hainey, Tom; Connolly, Thomas; Stansfield, Mark; Boyle, Elizabeth
2011-01-01
Computer games have become a highly popular form of entertainment and have had a large impact on how University students spend their leisure time. Due to their highly motivating properties computer games have come to the attention of educationalists who wish to exploit these highly desirable properties for educational purposes. Several studies…
GPU-based cone beam computed tomography.
Noël, Peter B; Walczak, Alan M; Xu, Jinhui; Corso, Jason J; Hoffmann, Kenneth R; Schafer, Sebastian
2010-06-01
The use of cone beam computed tomography (CBCT) is growing in the clinical arena due to its ability to provide 3D information during interventions, its high diagnostic quality (sub-millimeter resolution), and its short scanning times (60 s). In many situations, the short scanning time of CBCT is followed by a time-consuming 3D reconstruction. The standard reconstruction algorithm for CBCT data is the filtered backprojection, which for a volume of size 256(3) takes up to 25 min on a standard system. Recent developments in the area of Graphic Processing Units (GPUs) make it possible to have access to high-performance computing solutions at a low cost, allowing their use in many scientific problems. We have implemented an algorithm for 3D reconstruction of CBCT data using the Compute Unified Device Architecture (CUDA) provided by NVIDIA (NVIDIA Corporation, Santa Clara, California), which was executed on a NVIDIA GeForce GTX 280. Our implementation results in improved reconstruction times from minutes, and perhaps hours, to a matter of seconds, while also giving the clinician the ability to view 3D volumetric data at higher resolutions. We evaluated our implementation on ten clinical data sets and one phantom data set to observe if differences occur between CPU and GPU-based reconstructions. By using our approach, the computation time for 256(3) is reduced from 25 min on the CPU to 3.2 s on the GPU. The GPU reconstruction time for 512(3) volumes is 8.5 s. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Priest, Richard Harding
A significant percentage of high school science teachers are not using computers to teach their students or prepare them for standardized testing. A survey of high school science teachers was conducted to determine how they are having students use computers in the classroom, why science teachers are not using computers in the classroom, which variables were relevant to their not using computers, and what are the effects of standardized testing on the use of technology in the high school science classroom. A self-administered questionnaire was developed to measure these aspects of computer integration and demographic information. A follow-up telephone interview survey of a portion of the original sample was conducted in order to clarify questions, correct misunderstandings, and to draw out more holistic descriptions from the subjects. The primary method used to analyze the quantitative data was frequency distributions. Multiple regression analysis was used to investigate the relationships between the barriers and facilitators and the dimensions of instructional use, frequency, and importance of the use of computers. All high school science teachers in a large urban/suburban school district were sent surveys. A response rate of 58% resulted from two mailings of the survey. It was found that contributing factors to why science teachers do not use computers were not enough up-to-date computers in their classrooms and other educational commitments and duties do not leave them enough time to prepare lessons that include technology. While a high percentage of science teachers thought their school and district administrations were supportive of technology, they also believed more inservice technology training and follow-up activities to support that training are needed and more software needs to be created. The majority of the science teachers do not use the computer to help students prepare for standardized tests because they believe they can prepare students more efficiently without a computer. Nearly half of the teachers, however, gave lack of time to prepare instructional materials and lack of a means to project a computer image to the whole class as reasons they do not use computers. A significant percentage thought science standardized testing was having a negative effect on computer use.
NASA HPCC Technology for Aerospace Analysis and Design
NASA Technical Reports Server (NTRS)
Schulbach, Catherine H.
1999-01-01
The Computational Aerosciences (CAS) Project is part of NASA's High Performance Computing and Communications Program. Its primary goal is to accelerate the availability of high-performance computing technology to the US aerospace community-thus providing the US aerospace community with key tools necessary to reduce design cycle times and increase fidelity in order to improve safety, efficiency and capability of future aerospace vehicles. A complementary goal is to hasten the emergence of a viable commercial market within the aerospace community for the advantage of the domestic computer hardware and software industry. The CAS Project selects representative aerospace problems (especially design) and uses them to focus efforts on advancing aerospace algorithms and applications, systems software, and computing machinery to demonstrate vast improvements in system performance and capability over the life of the program. Recent demonstrations have served to assess the benefits of possible performance improvements while reducing the risk of adopting high-performance computing technology. This talk will discuss past accomplishments in providing technology to the aerospace community, present efforts, and future goals. For example, the times to do full combustor and compressor simulations (of aircraft engines) have been reduced by factors of 320:1 and 400:1 respectively. While this has enabled new capabilities in engine simulation, the goal of an overnight, dynamic, multi-disciplinary, 3-dimensional simulation of an aircraft engine is still years away and will require new generations of high-end technology.
Real-time image reconstruction and display system for MRI using a high-speed personal computer.
Haishi, T; Kose, K
1998-09-01
A real-time NMR image reconstruction and display system was developed using a high-speed personal computer and optimized for the 32-bit multitasking Microsoft Windows 95 operating system. The system was operated at various CPU clock frequencies by changing the motherboard clock frequency and the processor/bus frequency ratio. When the Pentium CPU was used at the 200 MHz clock frequency, the reconstruction time for one 128 x 128 pixel image was 48 ms and that for the image display on the enlarged 256 x 256 pixel window was about 8 ms. NMR imaging experiments were performed with three fast imaging sequences (FLASH, multishot EPI, and one-shot EPI) to demonstrate the ability of the real-time system. It was concluded that in most cases, high-speed PC would be the best choice for the image reconstruction and display system for real-time MRI. Copyright 1998 Academic Press.
Accelerating the discovery of space-time patterns of infectious diseases using parallel computing.
Hohl, Alexander; Delmelle, Eric; Tang, Wenwu; Casas, Irene
2016-11-01
Infectious diseases have complex transmission cycles, and effective public health responses require the ability to monitor outbreaks in a timely manner. Space-time statistics facilitate the discovery of disease dynamics including rate of spread and seasonal cyclic patterns, but are computationally demanding, especially for datasets of increasing size, diversity and availability. High-performance computing reduces the effort required to identify these patterns, however heterogeneity in the data must be accounted for. We develop an adaptive space-time domain decomposition approach for parallel computation of the space-time kernel density. We apply our methodology to individual reported dengue cases from 2010 to 2011 in the city of Cali, Colombia. The parallel implementation reaches significant speedup compared to sequential counterparts. Density values are visualized in an interactive 3D environment, which facilitates the identification and communication of uneven space-time distribution of disease events. Our framework has the potential to enhance the timely monitoring of infectious diseases. Copyright © 2016 Elsevier Ltd. All rights reserved.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Changing from computing grid to knowledge grid in life-science grid.
Talukdar, Veera; Konar, Amit; Datta, Ayan; Choudhury, Anamika Roy
2009-09-01
Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Exploring Gigabyte Datasets in Real Time: Architectures, Interfaces and Time-Critical Design
NASA Technical Reports Server (NTRS)
Bryson, Steve; Gerald-Yamasaki, Michael (Technical Monitor)
1998-01-01
Architectures and Interfaces: The implications of real-time interaction on software architecture design: decoupling of interaction/graphics and computation into asynchronous processes. The performance requirements of graphics and computation for interaction. Time management in such an architecture. Examples of how visualization algorithms must be modified for high performance. Brief survey of interaction techniques and design, including direct manipulation and manipulation via widgets. talk discusses how human factors considerations drove the design and implementation of the virtual wind tunnel. Time-Critical Design: A survey of time-critical techniques for both computation and rendering. Emphasis on the assignment of a time budget to both the overall visualization environment and to each individual visualization technique in the environment. The estimation of the benefit and cost of an individual technique. Examples of the modification of visualization algorithms to allow time-critical control.
Computing Systems | High-Performance Computing | NREL
investigate, build, and test models of complex phenomena or entire integrated systems-that cannot be directly observed or manipulated in the lab, or would be too expensive or time consuming. Models and visualizations
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
NASA Astrophysics Data System (ADS)
Takemiya, Tetsushi
In modern aerospace engineering, the physics-based computational design method is becoming more important, as it is more efficient than experiments and because it is more suitable in designing new types of aircraft (e.g., unmanned aerial vehicles or supersonic business jets) than the conventional design method, which heavily relies on historical data. To enhance the reliability of the physics-based computational design method, researchers have made tremendous efforts to improve the fidelity of models. However, high-fidelity models require longer computational time, so the advantage of efficiency is partially lost. This problem has been overcome with the development of variable fidelity optimization (VFO). In VFO, different fidelity models are simultaneously employed in order to improve the speed and the accuracy of convergence in an optimization process. Among the various types of VFO methods, one of the most promising methods is the approximation management framework (AMF). In the AMF, objective and constraint functions of a low-fidelity model are scaled at a design point so that the scaled functions, which are referred to as "surrogate functions," match those of a high-fidelity model. Since scaling functions and the low-fidelity model constitutes surrogate functions, evaluating the surrogate functions is faster than evaluating the high-fidelity model. Therefore, in the optimization process, in which gradient-based optimization is implemented and thus many function calls are required, the surrogate functions are used instead of the high-fidelity model to obtain a new design point. The best feature of the AMF is that it may converge to a local optimum of the high-fidelity model in much less computational time than the high-fidelity model. However, through literature surveys and implementations of the AMF, the author xx found that (1) the AMF is very vulnerable when the computational analysis models have numerical noise, which is very common in high-fidelity models, and that (2) the AMF terminates optimization erroneously when the optimization problems have constraints. The first problem is due to inaccuracy in computing derivatives in the AMF, and the second problem is due to erroneous treatment of the trust region ratio, which sets the size of the domain for an optimization in the AMF. In order to solve the first problem of the AMF, automatic differentiation (AD) technique, which reads the codes of analysis models and automatically generates new derivative codes based on some mathematical rules, is applied. If derivatives are computed with the generated derivative code, they are analytical, and the required computational time is independent of the number of design variables, which is very advantageous for realistic aerospace engineering problems. However, if analysis models implement iterative computations such as computational fluid dynamics (CFD), which solves system partial differential equations iteratively, computing derivatives through the AD requires a massive memory size. The author solved this deficiency by modifying the AD approach and developing a more efficient implementation with CFD, and successfully applied the AD to general CFD software. In order to solve the second problem of the AMF, the governing equation of the trust region ratio, which is very strict against the violation of constraints, is modified so that it can accept the violation of constraints within some tolerance. By accepting violations of constraints during the optimization process, the AMF can continue optimization without terminating immaturely and eventually find the true optimum design point. With these modifications, the AMF is referred to as "Robust AMF," and it is applied to airfoil and wing aerodynamic design problems using Euler CFD software. The former problem has 21 design variables, and the latter 64. In both problems, derivatives computed with the proposed AD method are first compared with those computed with the finite differentiation (FD) method, and then, the Robust AMF is implemented along with the sequential quadratic programming (SQP) optimization method with only high-fidelity models. The proposed AD method computes derivatives more accurately and faster than the FD method, and the Robust AMF successfully optimizes shapes of the airfoil and the wing in a much shorter time than SQP with only high-fidelity models. These results clearly show the effectiveness of the Robust AMF. Finally, the feasibility of reducing computational time for calculating derivatives and the necessity of AMF with an optimum design point always in the feasible region are discussed as future work.
Early experiences in developing and managing the neuroscience gateway.
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T
2015-02-01
The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.
2015-01-01
SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
NASA Astrophysics Data System (ADS)
Doyle, Paul; Mtenzi, Fred; Smith, Niall; Collins, Adrian; O'Shea, Brendan
2012-09-01
The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.
Compute Element and Interface Box for the Hazard Detection System
NASA Technical Reports Server (NTRS)
Villalpando, Carlos Y.; Khanoyan, Garen; Stern, Ryan A.; Some, Raphael R.; Bailey, Erik S.; Carson, John M.; Vaughan, Geoffrey M.; Werner, Robert A.; Salomon, Phil M.; Martin, Keith E.;
2013-01-01
The Autonomous Landing and Hazard Avoidance Technology (ALHAT) program is building a sensor that enables a spacecraft to evaluate autonomously a potential landing area to generate a list of hazardous and safe landing sites. It will also provide navigation inputs relative to those safe sites. The Hazard Detection System Compute Element (HDS-CE) box combines a field-programmable gate array (FPGA) board for sensor integration and timing, with a multicore computer board for processing. The FPGA does system-level timing and data aggregation, and acts as a go-between, removing the real-time requirements from the processor and labeling events with a high resolution time. The processor manages the behavior of the system, controls the instruments connected to the HDS-CE, and services the "heavy lifting" computational requirements for analyzing the potential landing spots.
2015-07-01
performance computing time from the US Department of Defense (DOD) High Performance Computing Modernization program at the US Army Research Laboratory...Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time ...dimensional, compressible, Reynolds-averaged Navier-Stokes (RANS) equations are solved using a finite volume method. A point-implicit time - integration
ERIC Educational Resources Information Center
Hall, H. L.
1988-01-01
Reports on the advantages and disadvantages of desktop publishing, using the Apple Macintosh and "Pagemaker" software, to produce a high school yearbook. Asserts that while desktop publishing may be initially more time consuming for those unfamiliar with computers, desktop publishing gives high school journalism staffs more control over…
One-to-One Computing and Student Achievement in Ohio High Schools
ERIC Educational Resources Information Center
Williams, Nancy L.; Larwin, Karen H.
2016-01-01
This study explores the impact of one-to-one computing on student achievement in Ohio high schools as measured by performance on the Ohio Graduation Test (OGT). The sample included 24 treatment schools that were individually paired with a similar control school. An interrupted time series methodology was deployed to examine OGT data over a period…
NASA Astrophysics Data System (ADS)
Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek
2009-09-01
High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
Integrating Computer Architectures into the Design of High-Performance Controllers
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.; Leyland, Jane A.; Warmbrodt, William
1986-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, on-line graphics, and file management. This paper discusses five global design considerations that are useful to integrate array processor, multimicroprocessor, and host computer system architecture into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the non-real-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration will be briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind-tunnel environment, the control architecture can generally be applied to a wide range of automatic control applications.
Software Accelerates Computing Time for Complex Math
NASA Technical Reports Server (NTRS)
2014-01-01
Ames Research Center awarded Newark, Delaware-based EM Photonics Inc. SBIR funding to utilize graphic processing unit (GPU) technology- traditionally used for computer video games-to develop high-computing software called CULA. The software gives users the ability to run complex algorithms on personal computers with greater speed. As a result of the NASA collaboration, the number of employees at the company has increased 10 percent.
ERIC Educational Resources Information Center
Hubbard, Aleata Kimberly
2017-01-01
In this dissertation, I explored the pedagogical content knowledge of in-service high school educators recently assigned to teach computer science for the first time. Teachers were participating in a professional development program where they co-taught introductory computing classes with tech industry professionals. The study was motivated by…
Computational and experimental studies of LEBUs at high device Reynolds numbers
NASA Technical Reports Server (NTRS)
Bertelrud, Arild; Watson, R. D.
1988-01-01
The present paper summarizes computational and experimental studies for large-eddy breakup devices (LEBUs). LEBU optimization (using a computational approach considering compressibility, Reynolds number, and the unsteadiness of the flow) and experiments with LEBUs at high Reynolds numbers in flight are discussed. The measurements include streamwise as well as spanwise distributions of local skin friction. The unsteady flows around the LEBU devices and far downstream are characterized by strain-gage measurements on the devices and hot-wire readings downstream. Computations are made with available time-averaged and quasi-stationary techniques to find suitable device profiles with minimum drag.
High-speed multiple sequence alignment on a reconfigurable platform.
Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf
2006-01-01
Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.
Accurate Monotonicity - Preserving Schemes With Runge-Kutta Time Stepping
NASA Technical Reports Server (NTRS)
Suresh, A.; Huynh, H. T.
1997-01-01
A new class of high-order monotonicity-preserving schemes for the numerical solution of conservation laws is presented. The interface value in these schemes is obtained by limiting a higher-order polynominal reconstruction. The limiting is designed to preserve accuracy near extrema and to work well with Runge-Kutta time stepping. Computational efficiency is enhanced by a simple test that determines whether the limiting procedure is needed. For linear advection in one dimension, these schemes are shown as well as the Euler equations also confirm their high accuracy, good shock resolution, and computational efficiency.
Real-time data reduction capabilities at the Langley 7 by 10 foot high speed tunnel
NASA Technical Reports Server (NTRS)
Fox, C. H., Jr.
1980-01-01
The 7 by 10 foot high speed tunnel performs a wide range of tests employing a variety of model installation methods. To support the reduction of static data from this facility, a generalized wind tunnel data reduction program had been developed for use on the Langley central computer complex. The capabilities of a version of this generalized program adapted for real time use on a dedicated on-site computer are discussed. The input specifications, instructions for the console operator, and full descriptions of the algorithms are included.
Characteristic analysis and simulation for polysilicon comb micro-accelerometer
NASA Astrophysics Data System (ADS)
Liu, Fengli; Hao, Yongping
2008-10-01
High force update rate is a key factor for achieving high performance haptic rendering, which imposes a stringent real time requirement upon the execution environment of the haptic system. This requirement confines the haptic system to simplified environment for reducing the computation cost of haptic rendering algorithms. In this paper, we present a novel "hyper-threading" architecture consisting of several threads for haptic rendering. The high force update rate is achieved with relatively large computation time interval for each haptic loop. The proposed method was testified and proved to be effective with experiments on virtual wall prototype haptic system via Delta Haptic Device.
Kazantsev, D.; Van Eyndhoven, G.; Lionheart, W. R. B.; Withers, P. J.; Dobson, K. J.; McDonald, S. A.; Atwood, R.; Lee, P. D.
2015-01-01
There are many cases where one needs to limit the X-ray dose, or the number of projections, or both, for high frame rate (fast) imaging. Normally, it improves temporal resolution but reduces the spatial resolution of the reconstructed data. Fortunately, the redundancy of information in the temporal domain can be employed to improve spatial resolution. In this paper, we propose a novel regularizer for iterative reconstruction of time-lapse computed tomography. The non-local penalty term is driven by the available prior information and employs all available temporal data to improve the spatial resolution of each individual time frame. A high-resolution prior image from the same or a different imaging modality is used to enhance edges which remain stationary throughout the acquisition time while dynamic features tend to be regularized spatially. Effective computational performance together with robust improvement in spatial and temporal resolution makes the proposed method a competitive tool to state-of-the-art techniques. PMID:25939621
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Trends in life science grid: from computing grid to knowledge grid.
Konagaya, Akihiko
2006-12-18
Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Trends in life science grid: from computing grid to knowledge grid
Konagaya, Akihiko
2006-01-01
Background Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. Results This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Conclusion Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community. PMID:17254294
Lattice Boltzmann for Airframe Noise Predictions
NASA Technical Reports Server (NTRS)
Barad, Michael; Kocheemoolayil, Joseph; Kiris, Cetin
2017-01-01
Increase predictive use of High-Fidelity Computational Aero- Acoustics (CAA) capabilities for NASA's next generation aviation concepts. CFD has been utilized substantially in analysis and design for steady-state problems (RANS). Computational resources are extremely challenged for high-fidelity unsteady problems (e.g. unsteady loads, buffet boundary, jet and installation noise, fan noise, active flow control, airframe noise, etc) ü Need novel techniques for reducing the computational resources consumed by current high-fidelity CAA Need routine acoustic analysis of aircraft components at full-scale Reynolds number from first principles Need an order of magnitude reduction in wall time to solution!
Design of a high-speed digital processing element for parallel simulation
NASA Technical Reports Server (NTRS)
Milner, E. J.; Cwynar, D. S.
1983-01-01
A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
A Real Time Controller For Applications In Smart Structures
NASA Astrophysics Data System (ADS)
Ahrens, Christian P.; Claus, Richard O.
1990-02-01
Research in smart structures, especially the area of vibration suppression, has warranted the investigation of advanced computing environments. Real time PC computing power has limited development of high order control algorithms. This paper presents a simple Real Time Embedded Control System (RTECS) in an application of Intelligent Structure Monitoring by way of modal domain sensing for vibration control. It is compared to a PC AT based system for overall functionality and speed. The system employs a novel Reduced Instruction Set Computer (RISC) microcontroller capable of 15 million instructions per second (MIPS) continuous performance and burst rates of 40 MIPS. Advanced Complimentary Metal Oxide Semiconductor (CMOS) circuits are integrated on a single 100 mm by 160 mm printed circuit board requiring only 1 Watt of power. An operating system written in Forth provides high speed operation and short development cycles. The system allows for implementation of Input/Output (I/O) intensive algorithms and provides capability for advanced system development.
The computational challenges of Earth-system science.
O'Neill, Alan; Steenman-Clark, Lois
2002-06-15
The Earth system--comprising atmosphere, ocean, land, cryosphere and biosphere--is an immensely complex system, involving processes and interactions on a wide range of space- and time-scales. To understand and predict the evolution of the Earth system is one of the greatest challenges of modern science, with success likely to bring enormous societal benefits. High-performance computing, along with the wealth of new observational data, is revolutionizing our ability to simulate the Earth system with computer models that link the different components of the system together. There are, however, considerable scientific and technical challenges to be overcome. This paper will consider four of them: complexity, spatial resolution, inherent uncertainty and time-scales. Meeting these challenges requires a significant increase in the power of high-performance computers. The benefits of being able to make reliable predictions about the evolution of the Earth system should, on their own, amply repay this investment.
Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo
2018-06-08
Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
Parallel Processing Systems for Passive Ranging During Helicopter Flight
NASA Technical Reports Server (NTRS)
Sridhar, Bavavar; Suorsa, Raymond E.; Showman, Robert D. (Technical Monitor)
1994-01-01
The complexity of rotorcraft missions involving operations close to the ground result in high pilot workload. In order to allow a pilot time to perform mission-oriented tasks, sensor-aiding and automation of some of the guidance and control functions are highly desirable. Images from an electro-optical sensor provide a covert way of detecting objects in the flight path of a low-flying helicopter. Passive ranging consists of processing a sequence of images using techniques based on optical low computation and recursive estimation. The passive ranging algorithm has to extract obstacle information from imagery at rates varying from five to thirty or more frames per second depending on the helicopter speed. We have implemented and tested the passive ranging algorithm off-line using helicopter-collected images. However, the real-time data and computation requirements of the algorithm are beyond the capability of any off-the-shelf microprocessor or digital signal processor. This paper describes the computational requirements of the algorithm and uses parallel processing technology to meet these requirements. Various issues in the selection of a parallel processing architecture are discussed and four different computer architectures are evaluated regarding their suitability to process the algorithm in real-time. Based on this evaluation, we conclude that real-time passive ranging is a realistic goal and can be achieved with a short time.
NASA Astrophysics Data System (ADS)
Li, Xinya; Deng, Zhiqun Daniel; Rauchenstein, Lynn T.; Carlson, Thomas J.
2016-04-01
Locating the position of fixed or mobile sources (i.e., transmitters) based on measurements obtained from sensors (i.e., receivers) is an important research area that is attracting much interest. In this paper, we review several representative localization algorithms that use time of arrivals (TOAs) and time difference of arrivals (TDOAs) to achieve high signal source position estimation accuracy when a transmitter is in the line-of-sight of a receiver. Circular (TOA) and hyperbolic (TDOA) position estimation approaches both use nonlinear equations that relate the known locations of receivers and unknown locations of transmitters. Estimation of the location of transmitters using the standard nonlinear equations may not be very accurate because of receiver location errors, receiver measurement errors, and computational efficiency challenges that result in high computational burdens. Least squares and maximum likelihood based algorithms have become the most popular computational approaches to transmitter location estimation. In this paper, we summarize the computational characteristics and position estimation accuracies of various positioning algorithms. By improving methods for estimating the time-of-arrival of transmissions at receivers and transmitter location estimation algorithms, transmitter location estimation may be applied across a range of applications and technologies such as radar, sonar, the Global Positioning System, wireless sensor networks, underwater animal tracking, mobile communications, and multimedia.
Fekete, Szabolcs; Fekete, Jeno; Molnár, Imre; Ganzler, Katalin
2009-11-06
Many different strategies of reversed phase high performance liquid chromatographic (RP-HPLC) method development are used today. This paper describes a strategy for the systematic development of ultrahigh-pressure liquid chromatographic (UHPLC or UPLC) methods using 5cmx2.1mm columns packed with sub-2microm particles and computer simulation (DryLab((R)) package). Data for the accuracy of computer modeling in the Design Space under ultrahigh-pressure conditions are reported. An acceptable accuracy for these predictions of the computer models is presented. This work illustrates a method development strategy, focusing on time reduction up to a factor 3-5, compared to the conventional HPLC method development and exhibits parts of the Design Space elaboration as requested by the FDA and ICH Q8R1. Furthermore this paper demonstrates the accuracy of retention time prediction at elevated pressure (enhanced flow-rate) and shows that the computer-assisted simulation can be applied with sufficient precision for UHPLC applications (p>400bar). Examples of fast and effective method development in pharmaceutical analysis, both for gradient and isocratic separations are presented.
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
Delfino, Leandro D; Dos Santos Silva, Diego A; Tebar, William R; Zanuto, Edner F; Codogno, Jamile S; Fernandes, Rômulo A; Christofaro, Diego G
2018-03-01
Sedentary behaviors in adolescents are associated with using screen devices, analyzed as the total daily time in television viewing, using the computer and video game. However, an independent and clustered analysis of devices allows greater understanding of associations with physical inactivity domains and eating habits in adolescents. Sample of adolescents aged 10-17 years (N.=1011) from public and private schools, randomly selected. The use of screen devices was measured by hours per week spent in each device: TV, computer, videogames and mobile phone/tablet. Physical inactivity domains (school, leisure and sports), eating habits (weekly food consumption frequency) and socioeconomic status were assessed by questionnaire. The prevalence of high use of mobile phone/tablet was 70% among adolescents, 63% showed high use of TV or computer and 24% reported high use of videogames. High use of videogames was greater among boys and high use of mobile phone/tablet was higher among girls. Significant associations of high use of TV (OR=1.43, 95% CI: 1.04-1.99), computer (OR=1.44, 95% CI: 1.03-2.02), videogames (OR=1.65, 95% CI: 1.13-2.69) and consumption of snacks were observed. High use of computer was associated with fried foods consumption (OR=1.32, 95% CI: 1.01-1.75) and physical inactivity (OR=1.41, 95% CI: 1.03-1.95). Mobile phone was associated with consumption of sweets (OR=1.33, 95% CI: 1.00-1.80). Cluster using screen devices showed associations with high consumption of snacks, fried foods and sweets, even after controlling for confounding variables. The high use of screen devices was associated with high consumption of snacks, fried foods, sweets and physical inactivity in adolescents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goings, Joshua J.; Li, Xiaosong, E-mail: xsli@uw.edu
2016-06-21
One of the challenges of interpreting electronic circular dichroism (ECD) band spectra is that different states may have different rotatory strength signs, determined by their absolute configuration. If the states are closely spaced and opposite in sign, observed transitions may be washed out by nearby states, unlike absorption spectra where transitions are always positive additive. To accurately compute ECD bands, it is necessary to compute a large number of excited states, which may be prohibitively costly if one uses the linear-response time-dependent density functional theory (TDDFT) framework. Here we implement a real-time, atomic-orbital based TDDFT method for computing the entiremore » ECD spectrum simultaneously. The method is advantageous for large systems with a high density of states. In contrast to previous implementations based on real-space grids, the method is variational, independent of nuclear orientation, and does not rely on pseudopotential approximations, making it suitable for computation of chiroptical properties well into the X-ray regime.« less
hp-Adaptive time integration based on the BDF for viscous flows
NASA Astrophysics Data System (ADS)
Hay, A.; Etienne, S.; Pelletier, D.; Garon, A.
2015-06-01
This paper presents a procedure based on the Backward Differentiation Formulas of order 1 to 5 to obtain efficient time integration of the incompressible Navier-Stokes equations. The adaptive algorithm performs both stepsize and order selections to control respectively the solution accuracy and the computational efficiency of the time integration process. The stepsize selection (h-adaptivity) is based on a local error estimate and an error controller to guarantee that the numerical solution accuracy is within a user prescribed tolerance. The order selection (p-adaptivity) relies on the idea that low-accuracy solutions can be computed efficiently by low order time integrators while accurate solutions require high order time integrators to keep computational time low. The selection is based on a stability test that detects growing numerical noise and deems a method of order p stable if there is no method of lower order that delivers the same solution accuracy for a larger stepsize. Hence, it guarantees both that (1) the used method of integration operates inside of its stability region and (2) the time integration procedure is computationally efficient. The proposed time integration procedure also features a time-step rejection and quarantine mechanisms, a modified Newton method with a predictor and dense output techniques to compute solution at off-step points.
Optoelectronic Reservoir Computing
Paquot, Y.; Duport, F.; Smerieri, A.; Dambre, J.; Schrauwen, B.; Haelterman, M.; Massar, S.
2012-01-01
Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an optoelectronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations. PMID:22371825
ERIC Educational Resources Information Center
Supej, Matej; Holmberg, Hans-Christer
2011-01-01
Accurate time measurement is essential to temporal analysis in sport. This study aimed to (a) develop a new method for time computation from surveyed trajectories using a high-end global navigation satellite system (GNSS), (b) validate its precision by comparing GNSS with photocells, and (c) examine whether gate-to-gate times can provide more…
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.
Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar
2014-01-01
Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
gpuSPHASE-A shared memory caching implementation for 2D SPH using CUDA
NASA Astrophysics Data System (ADS)
Winkler, Daniel; Meister, Michael; Rezavand, Massoud; Rauch, Wolfgang
2017-04-01
Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD such simulations in 3D need a computation time of years so that a reduction to a 2D domain is inevitable. In this paper gpuSPHASE, a new open-source 2D SPH solver implementation for graphics devices, is developed. It is optimized for simulations that must be executed with thousands of frames per second to be computed in reasonable time. A novel caching algorithm for Compute Unified Device Architecture (CUDA) shared memory is proposed and implemented. The software is validated and the performance is evaluated for the well established dambreak test case.
Exploring Cloud Computing for Large-scale Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Guang; Han, Binh; Yin, Jian
This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.
Kim, Lok-Won
2018-05-01
Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).
FPGA-based architecture for motion recovering in real-time
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar
2002-03-01
A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
Investigation of television transmission using adaptive delta modulation principles
NASA Technical Reports Server (NTRS)
Schilling, D. L.
1976-01-01
The results are presented of a study on the use of the delta modulator as a digital encoder of television signals. The computer simulation of different delta modulators was studied in order to find a satisfactory delta modulator. After finding a suitable delta modulator algorithm via computer simulation, the results were analyzed and then implemented in hardware to study its ability to encode real time motion pictures from an NTSC format television camera. The effects of channel errors on the delta modulated video signal were tested along with several error correction algorithms via computer simulation. A very high speed delta modulator was built (out of ECL logic), incorporating the most promising of the correction schemes, so that it could be tested on real time motion pictures. Delta modulators were investigated which could achieve significant bandwidth reduction without regard to complexity or speed. The first scheme investigated was a real time frame to frame encoding scheme which required the assembly of fourteen, 131,000 bit long shift registers as well as a high speed delta modulator. The other schemes involved the computer simulation of two dimensional delta modulator algorithms.
Igarashi, Jun; Shouno, Osamu; Fukai, Tomoki; Tsujino, Hiroshi
2011-11-01
Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic neurons. In order to address these problems, mainly the latter, we chose to use general purpose computing on graphics processing units (GPGPUs) for simulation of such a neural network, taking advantage of the powerful computational capability of a graphics processing unit (GPU). As a target for real-time simulation, we used a model of the basal ganglia that has been developed according to electrophysiological and anatomical knowledge. The model consists of heterogeneous populations of 370 spiking model neurons, including computationally heavy conductance-based models, connected by 11,002 synapses. Simulation of the model has not yet been performed in real-time using a general computing server. By parallelization of the model on the NVIDIA Geforce GTX 280 GPU in data-parallel and task-parallel fashion, faster-than-real-time simulation was robustly realized with only one-third of the GPU's total computational resources. Furthermore, we used the GPU's full computational resources to perform faster-than-real-time simulation of three instances of the basal ganglia model; these instances consisted of 1100 neurons and 33,006 synapses and were synchronized at each calculation step. Finally, we developed software for simultaneous visualization of faster-than-real-time simulation output. These results suggest the potential power of GPGPU techniques in real-time simulation of realistic neural networks. Copyright © 2011 Elsevier Ltd. All rights reserved.
Advanced flight computers for planetary exploration
NASA Technical Reports Server (NTRS)
Stephenson, R. Rhoads
1988-01-01
Research concerning flight computers for use on interplanetary probes is reviewed. The history of these computers from the Viking mission to the present is outlined. The differences between ground commercial computers and computers for planetary exploration are listed. The development of a computer for the Mariner Mark II comet rendezvous asteroid flyby mission is described. Various aspects of recently developed computer systems are examined, including the Max real time, embedded computer, a hypercube distributed supercomputer, a SAR data processor, a processor for the High Resolution IR Imaging Spectrometer, and a robotic vision multiresolution pyramid machine for processsing images obtained by a Mars Rover.
NASA Astrophysics Data System (ADS)
Xamán, J.; Zavala-Guillén, I.; Hernández-López, I.; Uriarte-Flores, J.; Hernández-Pérez, I.; Macías-Melo, E. V.; Aguilar-Castro, K. M.
2018-03-01
In this paper, we evaluated the convergence rate (CPU time) of a new mathematical formulation for the numerical solution of the radiative transfer equation (RTE) with several High-Order (HO) and High-Resolution (HR) schemes. In computational fluid dynamics, this procedure is known as the Normalized Weighting-Factor (NWF) method and it is adopted here. The NWF method is used to incorporate the high-order resolution schemes in the discretized RTE. The NWF method is compared, in terms of computer time needed to obtain a converged solution, with the widely used deferred-correction (DC) technique for the calculations of a two-dimensional cavity with emitting-absorbing-scattering gray media using the discrete ordinates method. Six parameters, viz. the grid size, the order of quadrature, the absorption coefficient, the emissivity of the boundary surface, the under-relaxation factor, and the scattering albedo are considered to evaluate ten schemes. The results showed that using the DC method, in general, the scheme that had the lowest CPU time is the SOU. In contrast, with the results of theDC procedure the CPU time for DIAMOND and QUICK schemes using the NWF method is shown to be, between the 3.8 and 23.1% faster and 12.6 and 56.1% faster, respectively. However, the other schemes are more time consuming when theNWFis used instead of the DC method. Additionally, a second test case was presented and the results showed that depending on the problem under consideration, the NWF procedure may be computationally faster or slower that the DC method. As an example, the CPU time for QUICK and SMART schemes are 61.8 and 203.7%, respectively, slower when the NWF formulation is used for the second test case. Finally, future researches to explore the computational cost of the NWF method in more complex problems are required.
An Overview of High Performance Computing and Challenges for the Future
Google Tech Talks
2017-12-09
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. A new generation of software libraries and lgorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder. We will focus on the redesign of software to fit multicore architectures. Speaker: Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra received a Bachelor of Science in Mathematics from Chicago State University in 1972 and a Master of Science in Computer Science from the Illinois Institute of Technology in 1973. He received his Ph.D. in Applied Mathematics from the University of New Mexico in 1980. He worked at the Argonne National Laboratory until 1989, becoming a senior scientist. He now holds an appointment as University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee, has the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow in the Computer Science and Mathematics Schools at the University of Manchester, and an Adjunct Professor in the Computer Science Department at Rice University. He specializes in numerical algorithms in linear algebra, parallel computing, the use of advanced-computer architectures, programming methodology, and tools for parallel computers. His research includes the development, testing and documentation of high quality mathematical software. He has contributed to the design and implementation of the following open source software packages and systems: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI. He has published approximately 200 articles, papers, reports and technical memoranda and he is coauthor of several books. He was awarded the IEEE Sid Fernbach Award in 2004 for his contributions in the application of high performance computers using innovative approaches. He is a Fellow of the AAAS, ACM, and the IEEE and a member of the National Academy of Engineering.
An Overview of High Performance Computing and Challenges for the Future
DOE Office of Scientific and Technical Information (OSTI.GOV)
Google Tech Talks
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. A new generation of software libraries and lgorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies,more » range of latencies, and increased run--time environment variability will make these problems much harder. We will focus on the redesign of software to fit multicore architectures. Speaker: Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra received a Bachelor of Science in Mathematics from Chicago State University in 1972 and a Master of Science in Computer Science from the Illinois Institute of Technology in 1973. He received his Ph.D. in Applied Mathematics from the University of New Mexico in 1980. He worked at the Argonne National Laboratory until 1989, becoming a senior scientist. He now holds an appointment as University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee, has the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow in the Computer Science and Mathematics Schools at the University of Manchester, and an Adjunct Professor in the Computer Science Department at Rice University. He specializes in numerical algorithms in linear algebra, parallel computing, the use of advanced-computer architectures, programming methodology, and tools for parallel computers. His research includes the development, testing and documentation of high quality mathematical software. He has contributed to the design and implementation of the following open source software packages and systems: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI. He has published approximately 200 articles, papers, reports and technical memoranda and he is coauthor of several books. He was awarded the IEEE Sid Fernbach Award in 2004 for his contributions in the application of high performance computers using innovative approaches. He is a Fellow of the AAAS, ACM, and the IEEE and a member of the National Academy of Engineering.« less
Liu, Dengyuan; Rao, Yunshuang; Zeng, Huan; Zhang, Fan; Wang, Lu; Xie, Yaojie; Sharma, Manoj; Zhao, Yong
2018-01-01
Objectives: This study aimed to assess the prevalence of prolonged television, computer, and mobile phone viewing times and examined related sociodemographic factors among Chinese pregnant women. Methods: In this study, a cross-sectional survey was implemented among 2400 Chinese pregnant women in 16 hospitals of 5 provinces from June to August in 2015, and the response rate of 97.76%. We excluded women with serious complications and cognitive disorders. The women were asked about their television, computer, and mobile phone viewing during pregnancy. Prolonged television watching or computer viewing was defined as spending more than two hours on television or computer viewing per day. Prolonged mobile phone viewing was watching more than one hour on mobile phone per day. Results: Among 2345 pregnant women, about 25.1% reported prolonged television viewing, 20.6% reported prolonged computer viewing, and 62.6% reported prolonged mobile phone viewing. Pregnant women with long mobile phone viewing times were likely have long TV (Estimate = 0.080, Standard Error (SE) = 0.016, p < 0.001) and computer viewing times (Estimate = 0.053, SE = 0.022, p = 0.015). Pregnant women with long TV (Estimate = 0.134, SE = 0.027, p < 0.001) and long computer viewing times (Estimate = 0.049, SE = 0.020, p = 0.015) were likely have long mobile phone viewing times. Pregnant women with long TV viewing times were less likely to have long computer viewing times (Estimate = −0.032, SE = 0.015, p = 0.035), and pregnant women with long computer viewing times were less likely have long TV viewing times (Estimate = −0.059, SE = 0.028, p = 0.035). Pregnant women in their second pregnancy had lower prolonged computer viewing times than those in their first pregnancy (Odds Ratio (OR) 0.56, 95% Confidence Interval (CI) 0.42–0.74). Pregnant women in their second pregnancy were more likely have longer prolonged mobile phone viewing times than those in their first pregnancy (OR 1.25, 95% CI 1.01–1.55). Conclusions: The high prevalence rate of prolonged TV, computer, and mobile phone viewing times was common for pregnant women in their first and second pregnancy. This study preliminarily explored the relationship between sociodemographic factors and prolonged screen time to provide some indication for future interventions related to decreasing screen-viewing times during pregnancy in China. PMID:29495439
The Julia programming language: the future of scientific computing
NASA Astrophysics Data System (ADS)
Gibson, John
2017-11-01
Julia is an innovative new open-source programming language for high-level, high-performance numerical computing. Julia combines the general-purpose breadth and extensibility of Python, the ease-of-use and numeric focus of Matlab, the speed of C and Fortran, and the metaprogramming power of Lisp. Julia uses type inference and just-in-time compilation to compile high-level user code to machine code on the fly. A rich set of numeric types and extensive numerical libraries are built-in. As a result, Julia is competitive with Matlab for interactive graphical exploration and with C and Fortran for high-performance computing. This talk interactively demonstrates Julia's numerical features and benchmarks Julia against C, C++, Fortran, Matlab, and Python on a spectral time-stepping algorithm for a 1d nonlinear partial differential equation. The Julia code is nearly as compact as Matlab and nearly as fast as Fortran. This material is based upon work supported by the National Science Foundation under Grant No. 1554149.
High-speed extended-term time-domain simulation for online cascading analysis of power system
NASA Astrophysics Data System (ADS)
Fu, Chuan
A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: (i) online analysis, including n-1 and n-k events, (ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, (iii) inclusion of rigorous protection-system modeling, (iv) intelligence for corrective action ID, storage, and fast retrieval, and (v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, (i) hardware, (ii) strategies, (iii) integration methods, (iv) nonlinear solvers, and (v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4 ) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSSE program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed.
A maximum entropy reconstruction technique for tomographic particle image velocimetry
NASA Astrophysics Data System (ADS)
Bilsky, A. V.; Lozhkin, V. A.; Markovich, D. M.; Tokarev, M. P.
2013-04-01
This paper studies a novel approach for reducing tomographic PIV computational complexity. The proposed approach is an algebraic reconstruction technique, termed MENT (maximum entropy). This technique computes the three-dimensional light intensity distribution several times faster than SMART, using at least ten times less memory. Additionally, the reconstruction quality remains nearly the same as with SMART. This paper presents the theoretical computation performance comparison for MENT, SMART and MART, followed by validation using synthetic particle images. Both the theoretical assessment and validation of synthetic images demonstrate significant computational time reduction. The data processing accuracy of MENT was compared to that of SMART in a slot jet experiment. A comparison of the average velocity profiles shows a high level of agreement between the results obtained with MENT and those obtained with SMART.
ERIC Educational Resources Information Center
Harvey, T. Edward
1987-01-01
A national survey of full-time instructional faculty (N=208) at universities, 2-year colleges, and high schools regarding attitudes toward using computers in second-language composition instruction revealed a predomination of Apple and IBM-PC computers used, a major frustration in lack of foreign character support, and mixed opinions about real…
An Application of Games-Based Learning within Software Engineering
ERIC Educational Resources Information Center
Connolly, Thomas M.; Stansfield, Mark; Hainey, Thomas
2007-01-01
For some time now, computer games have played an important role in both children and adults' leisure activities. While there has been much written on the negative aspects of computer games, it has also been recognised that they have potential advantages and benefits. There is no doubt that computer games are highly engaging and incorporate…
Comparability of Computer Delivered versus Traditional Paper and Pencil Testing
ERIC Educational Resources Information Center
Strader, Douglas A.
2012-01-01
There are many advantages supporting the use of computers as an alternate mode of delivery for high stakes testing: cost savings, increased test security, flexibility in test administrations, innovations in items, and reduced scoring time. The purpose of this study was to determine if the use of computers as the mode of delivery had any…
A Computer Based Program to Improve Reading and Mathematics Scores for High School Students.
ERIC Educational Resources Information Center
Bond, Carole L.; And Others
A study examined the effect on reading achievement, mathematics achievement, and ACT scores when computer based instruction (CBI) was compressed into a 6-week period of time. In addition, the effects of learning style and receptive language deficits on these scores were studied. Computer based instruction is a primary source of instruction that…
Quantum gates by periodic driving
Shi, Z. C.; Wang, W.; Yi, X. X.
2016-01-01
Topological quantum computation has been extensively studied in the past decades due to its robustness against decoherence. One way to realize the topological quantum computation is by adiabatic evolutions—it requires relatively long time to complete a gate, so the speed of quantum computation slows down. In this work, we present a method to realize single qubit quantum gates by periodic driving. Compared to adiabatic evolution, the single qubit gates can be realized at a fixed time much shorter than that by adiabatic evolution. The driving fields can be sinusoidal or square-well field. With the sinusoidal driving field, we derive an expression for the total operation time in the high-frequency limit, and an exact analytical expression for the evolution operator without any approximations is given for the square well driving. This study suggests that the period driving could provide us with a new direction in regulations of the operation time in topological quantum computation. PMID:26911900
Quantum gates by periodic driving.
Shi, Z C; Wang, W; Yi, X X
2016-02-25
Topological quantum computation has been extensively studied in the past decades due to its robustness against decoherence. One way to realize the topological quantum computation is by adiabatic evolutions-it requires relatively long time to complete a gate, so the speed of quantum computation slows down. In this work, we present a method to realize single qubit quantum gates by periodic driving. Compared to adiabatic evolution, the single qubit gates can be realized at a fixed time much shorter than that by adiabatic evolution. The driving fields can be sinusoidal or square-well field. With the sinusoidal driving field, we derive an expression for the total operation time in the high-frequency limit, and an exact analytical expression for the evolution operator without any approximations is given for the square well driving. This study suggests that the period driving could provide us with a new direction in regulations of the operation time in topological quantum computation.
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng
Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
Computations of unsteady multistage compressor flows in a workstation environment
NASA Technical Reports Server (NTRS)
Gundy-Burlet, Karen L.
1992-01-01
High-end graphics workstations are becoming a necessary tool in the computational fluid dynamics environment. In addition to their graphic capabilities, workstations of the latest generation have powerful floating-point-operation capabilities. As workstations become common, they could provide valuable computing time for such applications as turbomachinery flow calculations. This report discusses the issues involved in implementing an unsteady, viscous multistage-turbomachinery code (STAGE-2) on workstations. It then describes work in which the workstation version of STAGE-2 was used to study the effects of axial-gap spacing on the time-averaged and unsteady flow within a 2 1/2-stage compressor. The results included time-averaged surface pressures, time-averaged pressure contours, standard deviation of pressure contours, pressure amplitudes, and force polar plots.
Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics.
Zheng, Mo; Li, Xiaoxia; Guo, Li
2013-04-01
Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10-50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.'s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps' C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.
1985-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potok, Thomas E; Schuman, Catherine D; Young, Steven R
Current Deep Learning models use highly optimized convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers with a fairly simple layered network topology, i.e., highly connected layers, without intra-layer connections. Complex topologies have been proposed, but are intractable to train on current systems. Building the topologies of the deep learning network requires hand tuning, and implementing the network in hardware is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determinemore » network topology, and neuromorphic computing for a low-power hardware implementation. Due to input size limitations of current quantum computers we use the MNIST dataset for our evaluation. The results show the possibility of using the three architectures in tandem to explore complex deep learning networks that are untrainable using a von Neumann architecture. We show that a quantum computer can find high quality values of intra-layer connections and weights, while yielding a tractable time result as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. This represents a new capability that is not feasible with current von Neumann architecture. It potentially enables the ability to solve very complicated problems unsolvable with current computing technologies.« less
NASA Astrophysics Data System (ADS)
Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock
2017-01-01
The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Yang, Yiqun; Urban, Matthew W; McGough, Robert J
2018-05-15
Shear wave calculations induced by an acoustic radiation force are very time-consuming on desktop computers, and high-performance graphics processing units (GPUs) achieve dramatic reductions in the computation time for these simulations. The acoustic radiation force is calculated using the fast near field method and the angular spectrum approach, and then the shear waves are calculated in parallel with Green's functions on a GPU. This combination enables rapid evaluation of shear waves for push beams with different spatial samplings and for apertures with different f/#. Relative to shear wave simulations that evaluate the same algorithm on an Intel i7 desktop computer, a high performance nVidia GPU reduces the time required for these calculations by a factor of 45 and 700 when applied to elastic and viscoelastic shear wave simulation models, respectively. These GPU-accelerated simulations also compared to measurements in different viscoelastic phantoms, and the results are similar. For parametric evaluations and for comparisons with measured shear wave data, shear wave simulations with the Green's function approach are ideally suited for high-performance GPUs.
An evaluation of the state of time synchronization on leadership class supercomputers
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.; ...
2017-10-09
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
An evaluation of the state of time synchronization on leadership class supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Accumulated source imaging of brain activity with both low and high-frequency neuromagnetic signals
Xiang, Jing; Luo, Qian; Kotecha, Rupesh; Korman, Abraham; Zhang, Fawen; Luo, Huan; Fujiwara, Hisako; Hemasilpin, Nat; Rose, Douglas F.
2014-01-01
Recent studies have revealed the importance of high-frequency brain signals (>70 Hz). One challenge of high-frequency signal analysis is that the size of time-frequency representation of high-frequency brain signals could be larger than 1 terabytes (TB), which is beyond the upper limits of a typical computer workstation's memory (<196 GB). The aim of the present study is to develop a new method to provide greater sensitivity in detecting high-frequency magnetoencephalography (MEG) signals in a single automated and versatile interface, rather than the more traditional, time-intensive visual inspection methods, which may take up to several days. To address the aim, we developed a new method, accumulated source imaging, defined as the volumetric summation of source activity over a period of time. This method analyzes signals in both low- (1~70 Hz) and high-frequency (70~200 Hz) ranges at source levels. To extract meaningful information from MEG signals at sensor space, the signals were decomposed to channel-cross-channel matrix (CxC) representing the spatiotemporal patterns of every possible sensor-pair. A new algorithm was developed and tested by calculating the optimal CxC and source location-orientation weights for volumetric source imaging, thereby minimizing multi-source interference and reducing computational cost. The new method was implemented in C/C++ and tested with MEG data recorded from clinical epilepsy patients. The results of experimental data demonstrated that accumulated source imaging could effectively summarize and visualize MEG recordings within 12.7 h by using approximately 10 GB of computer memory. In contrast to the conventional method of visually identifying multi-frequency epileptic activities that traditionally took 2–3 days and used 1–2 TB storage, the new approach can quantify epileptic abnormalities in both low- and high-frequency ranges at source levels, using much less time and computer memory. PMID:24904402
Accumulated source imaging of brain activity with both low and high-frequency neuromagnetic signals.
Xiang, Jing; Luo, Qian; Kotecha, Rupesh; Korman, Abraham; Zhang, Fawen; Luo, Huan; Fujiwara, Hisako; Hemasilpin, Nat; Rose, Douglas F
2014-01-01
Recent studies have revealed the importance of high-frequency brain signals (>70 Hz). One challenge of high-frequency signal analysis is that the size of time-frequency representation of high-frequency brain signals could be larger than 1 terabytes (TB), which is beyond the upper limits of a typical computer workstation's memory (<196 GB). The aim of the present study is to develop a new method to provide greater sensitivity in detecting high-frequency magnetoencephalography (MEG) signals in a single automated and versatile interface, rather than the more traditional, time-intensive visual inspection methods, which may take up to several days. To address the aim, we developed a new method, accumulated source imaging, defined as the volumetric summation of source activity over a period of time. This method analyzes signals in both low- (1~70 Hz) and high-frequency (70~200 Hz) ranges at source levels. To extract meaningful information from MEG signals at sensor space, the signals were decomposed to channel-cross-channel matrix (CxC) representing the spatiotemporal patterns of every possible sensor-pair. A new algorithm was developed and tested by calculating the optimal CxC and source location-orientation weights for volumetric source imaging, thereby minimizing multi-source interference and reducing computational cost. The new method was implemented in C/C++ and tested with MEG data recorded from clinical epilepsy patients. The results of experimental data demonstrated that accumulated source imaging could effectively summarize and visualize MEG recordings within 12.7 h by using approximately 10 GB of computer memory. In contrast to the conventional method of visually identifying multi-frequency epileptic activities that traditionally took 2-3 days and used 1-2 TB storage, the new approach can quantify epileptic abnormalities in both low- and high-frequency ranges at source levels, using much less time and computer memory.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
High performance real-time flight simulation at NASA Langley
NASA Technical Reports Server (NTRS)
Cleveland, Jeff I., II
1994-01-01
In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations must be deterministic and be completed in as short a time as possible. This includes simulation mathematical model computational and data input/output to the simulators. In 1986, in response to increased demands for flight simulation performance, personnel at NASA's Langley Research Center (LaRC), working with the contractor, developed extensions to a standard input/output system to provide for high bandwidth, low latency data acquisition and distribution. The Computer Automated Measurement and Control technology (IEEE standard 595) was extended to meet the performance requirements for real-time simulation. This technology extension increased the effective bandwidth by a factor of ten and increased the performance of modules necessary for simulator communications. This technology is being used by more than 80 leading technological developers in the United States, Canada, and Europe. Included among the commercial applications of this technology are nuclear process control, power grid analysis, process monitoring, real-time simulation, and radar data acquisition. Personnel at LaRC have completed the development of the use of supercomputers for simulation mathematical model computational to support real-time flight simulation. This includes the development of a real-time operating system and the development of specialized software and hardware for the CAMAC simulator network. This work, coupled with the use of an open systems software architecture, has advanced the state of the art in real time flight simulation. The data acquisition technology innovation and experience with recent developments in this technology are described.
Role of optical computers in aeronautical control applications
NASA Technical Reports Server (NTRS)
Baumbick, R. J.
1981-01-01
The role that optical computers play in aircraft control is determined. The optical computer has the potential high speed capability required, especially for matrix/matrix operations. The optical computer also has the potential for handling nonlinear simulations in real time. They are also more compatible with fiber optic signal transmission. Optics also permit the use of passive sensors to measure process variables. No electrical energy need be supplied to the sensor. Complex interfacing between optical sensors and the optical computer is avoided if the optical sensor outputs can be directly processed by the optical computer.
2015-09-01
this report made use of posttest processing techniques to provide packet-level time tagging with an accuracy close to 3 µs relative to Coordinated...h set of test records. The process described herein made use of posttest processing techniques to provide packet-level time tagging with an accuracy
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
NASA Technical Reports Server (NTRS)
Loh, Ching Y.; Jorgenson, Philip C. E.
2007-01-01
A time-accurate, upwind, finite volume method for computing compressible flows on unstructured grids is presented. The method is second order accurate in space and time and yields high resolution in the presence of discontinuities. For efficiency, the Roe approximate Riemann solver with an entropy correction is employed. In the basic Euler/Navier-Stokes scheme, many concepts of high order upwind schemes are adopted: the surface flux integrals are carefully treated, a Cauchy-Kowalewski time-stepping scheme is used in the time-marching stage, and a multidimensional limiter is applied in the reconstruction stage. However even with these up-to-date improvements, the basic upwind scheme is still plagued by the so-called "pathological behaviors," e.g., the carbuncle phenomenon, the expansion shock, etc. A solution to these limitations is presented which uses a very simple dissipation model while still preserving second order accuracy. This scheme is referred to as the enhanced time-accurate upwind (ETAU) scheme in this paper. The unstructured grid capability renders flexibility for use in complex geometry; and the present ETAU Euler/Navier-Stokes scheme is capable of handling a broad spectrum of flow regimes from high supersonic to subsonic at very low Mach number, appropriate for both CFD (computational fluid dynamics) and CAA (computational aeroacoustics). Numerous examples are included to demonstrate the robustness of the methods.
A self-synchronized high speed computational ghost imaging system: A leap towards dynamic capturing
NASA Astrophysics Data System (ADS)
Suo, Jinli; Bian, Liheng; Xiao, Yudong; Wang, Yongjin; Zhang, Lei; Dai, Qionghai
2015-11-01
High quality computational ghost imaging needs to acquire a large number of correlated measurements between the to-be-imaged scene and different reference patterns, thus ultra-high speed data acquisition is of crucial importance in real applications. To raise the acquisition efficiency, this paper reports a high speed computational ghost imaging system using a 20 kHz spatial light modulator together with a 2 MHz photodiode. Technically, the synchronization between such high frequency illumination and bucket detector needs nanosecond trigger precision, so the development of synchronization module is quite challenging. To handle this problem, we propose a simple and effective computational self-synchronization scheme by building a general mathematical model and introducing a high precision synchronization technique. The resulted efficiency is around 14 times faster than state-of-the-arts, and takes an important step towards ghost imaging of dynamic scenes. Besides, the proposed scheme is a general approach with high flexibility for readily incorporating other illuminators and detectors.
Compact VLSI neural computer integrated with active pixel sensor for real-time ATR applications
NASA Astrophysics Data System (ADS)
Fang, Wai-Chi; Udomkesmalee, Gabriel; Alkalai, Leon
1997-04-01
A compact VLSI neural computer integrated with an active pixel sensor has been under development to mimic what is inherent in biological vision systems. This electronic eye- brain computer is targeted for real-time machine vision applications which require both high-bandwidth communication and high-performance computing for data sensing, synergy of multiple types of sensory information, feature extraction, target detection, target recognition, and control functions. The neural computer is based on a composite structure which combines Annealing Cellular Neural Network (ACNN) and Hierarchical Self-Organization Neural Network (HSONN). The ACNN architecture is a programmable and scalable multi- dimensional array of annealing neurons which are locally connected with their local neurons. Meanwhile, the HSONN adopts a hierarchical structure with nonlinear basis functions. The ACNN+HSONN neural computer is effectively designed to perform programmable functions for machine vision processing in all levels with its embedded host processor. It provides a two order-of-magnitude increase in computation power over the state-of-the-art microcomputer and DSP microelectronics. A compact current-mode VLSI design feasibility of the ACNN+HSONN neural computer is demonstrated by a 3D 16X8X9-cube neural processor chip design in a 2-micrometers CMOS technology. Integration of this neural computer as one slice of a 4'X4' multichip module into the 3D MCM based avionics architecture for NASA's New Millennium Program is also described.
Quantum Accelerators for High-performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.
We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less
High-Resiliency and Auto-Scaling of Large-Scale Cloud Computing for OCO-2 L2 Full Physics Processing
NASA Astrophysics Data System (ADS)
Hua, H.; Manipon, G.; Starch, M.; Dang, L. B.; Southam, P.; Wilson, B. D.; Avis, C.; Chang, A.; Cheng, C.; Smyth, M.; McDuffie, J. L.; Ramirez, P.
2015-12-01
Next generation science data systems are needed to address the incoming flood of data from new missions such as SWOT and NISAR where data volumes and data throughput rates are order of magnitude larger than present day missions. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. We present our experiences on deploying a hybrid-cloud computing science data system (HySDS) for the OCO-2 Science Computing Facility to support large-scale processing of their Level-2 full physics data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer ~10X costs savings but with an unpredictable computing environment based on market forces. We will present how we enabled high-tolerance computing in order to achieve large-scale computing as well as operational cost savings.
Living Color Frame System: PC graphics tool for data visualization
NASA Technical Reports Server (NTRS)
Truong, Long V.
1993-01-01
Living Color Frame System (LCFS) is a personal computer software tool for generating real-time graphics applications. It is highly applicable for a wide range of data visualization in virtual environment applications. Engineers often use computer graphics to enhance the interpretation of data under observation. These graphics become more complicated when 'run time' animations are required, such as found in many typical modern artificial intelligence and expert systems. Living Color Frame System solves many of these real-time graphics problems.
Lim, Tee Teng; Jung, Sun Young; Kim, Eunyi
2018-04-01
This study examined the impact of community and neighborhood on time spent computer gaming. Computer gaming for over 20 hours a week was set as the cutoff line for "engaged use" of computer games. For the analysis, this study analyzed data for about 1,800 subjects who participated in the Korean Children and Youth Panel Survey. The main findings are as follows: first, structural community characteristics and neighborhood social capital affected the engaged use of computer games. Second, adolescents who reside in regions with a higher divorce rate or higher residential mobility were likely to exhibit engaged use of computer games. Third, adolescents who highly perceive neighborhood social capital exhibited lower possibility of engaged use of computer games. Based on these findings, practical implications and directions for further study are suggested.
NASA Technical Reports Server (NTRS)
Rediess, Herman A.; Hewett, M. D.
1991-01-01
The requirements are assessed for the use of remote computation to support HRV flight testing. First, remote computational requirements were developed to support functions that will eventually be performed onboard operational vehicles of this type. These functions which either cannot be performed onboard in the time frame of initial HRV flight test programs because the technology of airborne computers will not be sufficiently advanced to support the computational loads required, or it is not desirable to perform the functions onboard in the flight test program for other reasons. Second, remote computational support either required or highly desirable to conduct flight testing itself was addressed. The use is proposed of an Automated Flight Management System which is described in conceptual detail. Third, autonomous operations is discussed and finally, unmanned operations.
NASA Astrophysics Data System (ADS)
Nakatsuji, Noriaki; Matsushima, Kyoji
2017-03-01
Full-parallax high-definition CGHs composed of more than billion pixels were so far created only by the polygon-based method because of its high performance. However, GPUs recently allow us to generate CGHs much faster by the point cloud. In this paper, we measure computation time of object fields for full-parallax high-definition CGHs, which are composed of 4 billion pixels and reconstruct the same scene, by using the point cloud with GPU and the polygon-based method with CPU. In addition, we compare the optical and simulated reconstructions between CGHs created by these techniques to verify the image quality.
Computational simulation and aerodynamic sensitivity analysis of film-cooled turbines
NASA Astrophysics Data System (ADS)
Massa, Luca
A computational tool is developed for the time accurate sensitivity analysis of the stage performance of hot gas, unsteady turbine components. An existing turbomachinery internal flow solver is adapted to the high temperature environment typical of the hot section of jet engines. A real gas model and film cooling capabilities are successfully incorporated in the software. The modifications to the existing algorithm are described; both the theoretical model and the numerical implementation are validated. The accuracy of the code in evaluating turbine stage performance is tested using a turbine geometry typical of the last stage of aeronautical jet engines. The results of the performance analysis show that the predictions differ from the experimental data by less than 3%. A reliable grid generator, applicable to the domain discretization of the internal flow field of axial flow turbine is developed. A sensitivity analysis capability is added to the flow solver, by rendering it able to accurately evaluate the derivatives of the time varying output functions. The complex Taylor's series expansion (CTSE) technique is reviewed. Two of them are used to demonstrate the accuracy and time dependency of the differentiation process. The results are compared with finite differences (FD) approximations. The CTSE is more accurate than the FD, but less efficient. A "black box" differentiation of the source code, resulting from the automated application of the CTSE, generates high fidelity sensitivity algorithms, but with low computational efficiency and high memory requirements. New formulations of the CTSE are proposed and applied. Selective differentiation of the method for solving the non-linear implicit residual equation leads to sensitivity algorithms with the same accuracy but improved run time. The time dependent sensitivity derivatives are computed in run times comparable to the ones required by the FD approach.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
High-performance computing in image registration
NASA Astrophysics Data System (ADS)
Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro
2012-10-01
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
A real time microcomputer implementation of sensor failure detection for turbofan engines
NASA Technical Reports Server (NTRS)
Delaat, John C.; Merrill, Walter C.
1989-01-01
An algorithm was developed which detects, isolates, and accommodates sensor failures using analytical redundancy. The performance of this algorithm was demonstrated on a full-scale F100 turbofan engine. The algorithm was implemented in real-time on a microprocessor-based controls computer which includes parallel processing and high order language programming. Parallel processing was used to achieve the required computational power for the real-time implementation. High order language programming was used in order to reduce the programming and maintenance costs of the algorithm implementation software. The sensor failure algorithm was combined with an existing multivariable control algorithm to give a complete control implementation with sensor analytical redundancy. The real-time microprocessor implementation of the algorithm which resulted in the successful completion of the algorithm engine demonstration, is described.
Distributed collaborative response surface method for mechanical dynamic assembly reliability design
NASA Astrophysics Data System (ADS)
Bai, Guangchen; Fei, Chengwei
2013-11-01
Because of the randomness of many impact factors influencing the dynamic assembly relationship of complex machinery, the reliability analysis of dynamic assembly relationship needs to be accomplished considering the randomness from a probabilistic perspective. To improve the accuracy and efficiency of dynamic assembly relationship reliability analysis, the mechanical dynamic assembly reliability(MDAR) theory and a distributed collaborative response surface method(DCRSM) are proposed. The mathematic model of DCRSM is established based on the quadratic response surface function, and verified by the assembly relationship reliability analysis of aeroengine high pressure turbine(HPT) blade-tip radial running clearance(BTRRC). Through the comparison of the DCRSM, traditional response surface method(RSM) and Monte Carlo Method(MCM), the results show that the DCRSM is not able to accomplish the computational task which is impossible for the other methods when the number of simulation is more than 100 000 times, but also the computational precision for the DCRSM is basically consistent with the MCM and improved by 0.40˜4.63% to the RSM, furthermore, the computational efficiency of DCRSM is up to about 188 times of the MCM and 55 times of the RSM under 10000 times simulations. The DCRSM is demonstrated to be a feasible and effective approach for markedly improving the computational efficiency and accuracy of MDAR analysis. Thus, the proposed research provides the promising theory and method for the MDAR design and optimization, and opens a novel research direction of probabilistic analysis for developing the high-performance and high-reliability of aeroengine.
Efficient scatter model for simulation of ultrasound images from computed tomography data
NASA Astrophysics Data System (ADS)
D'Amato, J. P.; Lo Vercio, L.; Rubi, P.; Fernandez Vera, E.; Barbuzza, R.; Del Fresno, M.; Larrabide, I.
2015-12-01
Background and motivation: Real-time ultrasound simulation refers to the process of computationally creating fully synthetic ultrasound images instantly. Due to the high value of specialized low cost training for healthcare professionals, there is a growing interest in the use of this technology and the development of high fidelity systems that simulate the acquisitions of echographic images. The objective is to create an efficient and reproducible simulator that can run either on notebooks or desktops using low cost devices. Materials and methods: We present an interactive ultrasound simulator based on CT data. This simulator is based on ray-casting and provides real-time interaction capabilities. The simulation of scattering that is coherent with the transducer position in real time is also introduced. Such noise is produced using a simplified model of multiplicative noise and convolution with point spread functions (PSF) tailored for this purpose. Results: The computational efficiency of scattering maps generation was revised with an improved performance. This allowed a more efficient simulation of coherent scattering in the synthetic echographic images while providing highly realistic result. We describe some quality and performance metrics to validate these results, where a performance of up to 55fps was achieved. Conclusion: The proposed technique for real-time scattering modeling provides realistic yet computationally efficient scatter distributions. The error between the original image and the simulated scattering image was compared for the proposed method and the state-of-the-art, showing negligible differences in its distribution.
Real-time estimation of BDS/GPS high-rate satellite clock offsets using sequential least squares
NASA Astrophysics Data System (ADS)
Fu, Wenju; Yang, Yuanxi; Zhang, Qin; Huang, Guanwen
2018-07-01
The real-time precise satellite clock product is one of key prerequisites for real-time Precise Point Positioning (PPP). The accuracy of the 24-hour predicted satellite clock product with 15 min sampling interval and an update of 6 h provided by the International GNSS Service (IGS) is only 3 ns, which could not meet the needs of all real-time PPP applications. The real-time estimation of high-rate satellite clock offsets is an efficient method for improving the accuracy. In this paper, the sequential least squares method to estimate real-time satellite clock offsets with high sample rate is proposed to improve the computational speed by applying an optimized sparse matrix operation to compute the normal equation and using special measures to take full advantage of modern computer power. The method is first applied to BeiDou Navigation Satellite System (BDS) and provides real-time estimation with a 1 s sample rate. The results show that the amount of time taken to process a single epoch is about 0.12 s using 28 stations. The Standard Deviation (STD) and Root Mean Square (RMS) of the real-time estimated BDS satellite clock offsets are 0.17 ns and 0.44 ns respectively when compared to German Research Center for Geosciences (GFZ) final clock products. The positioning performance of the real-time estimated satellite clock offsets is evaluated. The RMSs of the real-time BDS kinematic PPP in east, north, and vertical components are 7.6 cm, 6.4 cm and 19.6 cm respectively. The method is also applied to Global Positioning System (GPS) with a 10 s sample rate and the computational time of most epochs is less than 1.5 s with 75 stations. The STD and RMS of the real-time estimated GPS satellite clocks are 0.11 ns and 0.27 ns, respectively. The accuracies of 5.6 cm, 2.6 cm and 7.9 cm in east, north, and vertical components are achieved for the real-time GPS kinematic PPP.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
ERIC Educational Resources Information Center
Evans, Richard M.; Surkan, Alvin J.
The recent arrival of portable computer systems with high-level language interpreters now makes it practical to rapidly develop complex testing and scoring programs. These programs permit undergraduates access, at arbitrary times, to testing as an integral part of a mastery learning strategy. Effects of introducing the computer were studied by…
Ways of achieving continuous service from computers
NASA Technical Reports Server (NTRS)
Quinn, M. J., Jr.
1974-01-01
This paper outlines the methods used in the real-time computer complex to keep computers operating. Methods include selectover, high-speed restart, and low-speed restart. The hardware and software needed to implement these methods is discussed as well as the system recovery facility, alternate device support, and timeout. In general, methods developed while supporting the Gemini, Apollo, and Skylab space missions are presented.
USDA-ARS?s Scientific Manuscript database
Computer simulation is a useful tool for benchmarking the electrical and fuel energy consumption and water use in a fluid milk plant. In this study, a computer simulation model of the fluid milk process based on high temperature short time (HTST) pasteurization was extended to include models for pr...
GenomicTools: a computational platform for developing high-throughput analytics in genomics.
Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo
2012-01-15
Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
Acceleration of FDTD mode solver by high-performance computing techniques.
Han, Lin; Xi, Yanping; Huang, Wei-Ping
2010-06-21
A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
jInv: A Modular and Scalable Framework for Electromagnetic Inverse Problems
NASA Astrophysics Data System (ADS)
Belliveau, P. T.; Haber, E.
2016-12-01
Inversion is a key tool in the interpretation of geophysical electromagnetic (EM) data. Three-dimensional (3D) EM inversion is very computationally expensive and practical software for inverting large 3D EM surveys must be able to take advantage of high performance computing (HPC) resources. It has traditionally been difficult to achieve those goals in a high level dynamic programming environment that allows rapid development and testing of new algorithms, which is important in a research setting. With those goals in mind, we have developed jInv, a framework for PDE constrained parameter estimation problems. jInv provides optimization and regularization routines, a framework for user defined forward problems, and interfaces to several direct and iterative solvers for sparse linear systems. The forward modeling framework provides finite volume discretizations of differential operators on rectangular tensor product meshes and tetrahedral unstructured meshes that can be used to easily construct forward modeling and sensitivity routines for forward problems described by partial differential equations. jInv is written in the emerging programming language Julia. Julia is a dynamic language targeted at the computational science community with a focus on high performance and native support for parallel programming. We have developed frequency and time-domain EM forward modeling and sensitivity routines for jInv. We will illustrate its capabilities and performance with two synthetic time-domain EM inversion examples. First, in airborne surveys, which use many sources, we achieve distributed memory parallelism by decoupling the forward and inverse meshes and performing forward modeling for each source on small, locally refined meshes. Secondly, we invert grounded source time-domain data from a gradient array style induced polarization survey using a novel time-stepping technique that allows us to compute data from different time-steps in parallel. These examples both show that it is possible to invert large scale 3D time-domain EM datasets within a modular, extensible framework written in a high-level, easy to use programming language.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-11-01
The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients
NASA Astrophysics Data System (ADS)
Mnaouer, Adel B.; Ragoonath, Colin
2010-11-01
Volunteer Computing (VC) is a Distributed Computing model which utilizes idle CPU cycles from computing resources donated by volunteers who are connected through the Internet to form a very large-scale, loosely coupled High Performance Computing environment. Distributed Volunteer Computing environments such as the BOINC framework is concerned mainly with the efficient scheduling of the available resources to the applications which require them. The BOINC framework thus contains a number of scheduling policies/algorithms both on the server-side and on the client which work together to maximize the available resources and to provide a degree of QoS in an environment which is highly volatile. This paper focuses on the BOINC client and introduces an adaptive priority tuning client side middleware application which improves the execution times of Work Units (WUs) while maintaining an acceptable Maximum Response Time (MRT) for the end user. We have conducted extensive experimentation of the proposed system and the results show clear speedup of BOINC applications using our optimized middleware as opposed to running using the original BOINC client.
Spin-wave utilization in a quantum computer
NASA Astrophysics Data System (ADS)
Khitun, A.; Ostroumov, R.; Wang, K. L.
2001-12-01
We propose a quantum computer scheme using spin waves for quantum-information exchange. We demonstrate that spin waves in the antiferromagnetic layer grown on silicon may be used to perform single-qubit unitary transformations together with two-qubit operations during the cycle of computation. The most attractive feature of the proposed scheme is the possibility of random access to any qubit and, consequently, the ability to recognize two qubit gates between any two distant qubits. Also, spin waves allow us to eliminate the use of a strong external magnetic field and microwave pulses. By estimate, the proposed scheme has as high as 104 ratio between quantum system coherence time and the time of a single computational step.
Highly fault-tolerant parallel computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spielman, D.A.
We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations inmore » which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using w log{sup O(l)} w processors and time t log{sup O(l)} w. The failure probability of the computation will be at most t {center_dot} exp(-w{sup 1/4}). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(n log{sup O(1)} n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.« less
van Vugt, Raoul; Kool, Digna R; Deunk, Jaap; Edwards, Michael J R
2012-03-01
Currently, total body computed tomography (TBCT) is rapidly implemented in the evaluation of trauma patients. With this review, we aim to evaluate the clinical implications-mortality, change in treatment, and time management-of the routine use of TBCT in adult blunt high-energy trauma patients compared with a conservative approach with the use of conventional radiography, ultrasound, and selective computed tomography. A literature search for original studies on TBCT in blunt high-energy trauma patients was performed. Two independent observers included studies concerning mortality, change of treatment, and/or time management as outcome measures. For each article, relevant data were extracted and analyzed. In addition, the quality according to the Oxford levels of evidence was assessed. From 183 articles initially identified, the observers included nine original studies in consensus. One of three studies described a significant difference in mortality; four described a change of treatment in 2% to 27% of patients because of the use of TBCT. Five studies found a gain in time with the use of immediate routine TBCT. Eight studies scored a level of evidence of 2b and one of 3b. Current literature has predominantly suboptimal design to prove terminally that the routine use of TBCT results in improved survival of blunt high-energy trauma patients. TBCT can give a change of treatment and improves time intervals in the emergency department as compared with its selective use.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
Turbulence Model Effects on Cold-Gas Lateral Jet Interaction in a Supersonic Crossflow
2014-06-01
performance computing time from the U.S. Department of Defense (DOD) High Performance Computing Modernization program at the U.S. Army Research Laboratory... time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the...thanks Dr. Ross Chaplin , Defence Science Technology Laboratory, United Kingdom (UK), and Dr. David MacManus and Robert Christie, Cranfield University, UK
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-10-20
Look-ahead dynamic simulation software system incorporates the high performance parallel computing technologies, significantly reduces the solution time for each transient simulation case, and brings the dynamic simulation analysis into on-line applications to enable more transparency for better reliability and asset utilization. It takes the snapshot of the current power grid status, functions in parallel computing the system dynamic simulation, and outputs the transient response of the power system in real time.
An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed
NASA Astrophysics Data System (ADS)
Kim, H.; Choi, Y.; Yang, Y.
In this paper a state-of-the-art FPGA (Field Programmable Gate Array) based high speed parallel signal processing system (SPS) for adaptive optics (AO) testbed with 1 kHz wavefront error (WFE) correction frequency is reported. The AO system consists of Shack-Hartmann sensor (SHS) and deformable mirror (DM), tip-tilt sensor (TTS), tip-tilt mirror (TTM) and an FPGA-based high performance SPS to correct wavefront aberrations. The SHS is composed of 400 subapertures and the DM 277 actuators with Fried geometry, requiring high speed parallel computing capability SPS. In this study, the target WFE correction speed is 1 kHz; therefore, it requires massive parallel computing capabilities as well as strict hard real time constraints on measurements from sensors, matrix computation latency for correction algorithms, and output of control signals for actuators. In order to meet them, an FPGA based real-time SPS with parallel computing capabilities is proposed. In particular, the SPS is made up of a National Instrument's (NI's) real time computer and five FPGA boards based on state-of-the-art Xilinx Kintex 7 FPGA. Programming is done with NI's LabView environment, providing flexibility when applying different algorithms for WFE correction. It also facilitates faster programming and debugging environment as compared to conventional ones. One of the five FPGA's is assigned to measure TTS and calculate control signals for TTM, while the rest four are used to receive SHS signal, calculate slops for each subaperture and correction signal for DM. With this parallel processing capabilities of the SPS the overall closed-loop WFE correction speed of 1 kHz has been achieved. System requirements, architecture and implementation issues are described; furthermore, experimental results are also given.
Time-domain wavefield reconstruction inversion
NASA Astrophysics Data System (ADS)
Li, Zhen-Chun; Lin, Yu-Zhao; Zhang, Kai; Li, Yuan-Yuan; Yu, Zhen-Nan
2017-12-01
Wavefield reconstruction inversion (WRI) is an improved full waveform inversion theory that has been proposed in recent years. WRI method expands the searching space by introducing the wave equation into the objective function and reconstructing the wavefield to update model parameters, thereby improving the computing efficiency and mitigating the influence of the local minimum. However, frequency-domain WRI is difficult to apply to real seismic data because of the high computational memory demand and requirement of time-frequency transformation with additional computational costs. In this paper, wavefield reconstruction inversion theory is extended into the time domain, the augmented wave equation of WRI is derived in the time domain, and the model gradient is modified according to the numerical test with anomalies. The examples of synthetic data illustrate the accuracy of time-domain WRI and the low dependency of WRI on low-frequency information.
Increasing Accuracy in Computed Inviscid Boundary Conditions
NASA Technical Reports Server (NTRS)
Dyson, Roger
2004-01-01
A technique has been devised to increase the accuracy of computational simulations of flows of inviscid fluids by increasing the accuracy with which surface boundary conditions are represented. This technique is expected to be especially beneficial for computational aeroacoustics, wherein it enables proper accounting, not only for acoustic waves, but also for vorticity and entropy waves, at surfaces. Heretofore, inviscid nonlinear surface boundary conditions have been limited to third-order accuracy in time for stationary surfaces and to first-order accuracy in time for moving surfaces. For steady-state calculations, it may be possible to achieve higher accuracy in space, but high accuracy in time is needed for efficient simulation of multiscale unsteady flow phenomena. The present technique is the first surface treatment that provides the needed high accuracy through proper accounting of higher-order time derivatives. The present technique is founded on a method known in art as the Hermitian modified solution approximation (MESA) scheme. This is because high time accuracy at a surface depends upon, among other things, correction of the spatial cross-derivatives of flow variables, and many of these cross-derivatives are included explicitly on the computational grid in the MESA scheme. (Alternatively, a related method other than the MESA scheme could be used, as long as the method involves consistent application of the effects of the cross-derivatives.) While the mathematical derivation of the present technique is too lengthy and complex to fit within the space available for this article, the technique itself can be characterized in relatively simple terms: The technique involves correction of surface-normal spatial pressure derivatives at a boundary surface to satisfy the governing equations and the boundary conditions and thereby achieve arbitrarily high orders of time accuracy in special cases. The boundary conditions can now include a potentially infinite number of time derivatives of surface-normal velocity (consistent with no flow through the boundary) up to arbitrarily high order. The corrections for the first-order spatial derivatives of pressure are calculated by use of the first-order time derivative velocity. The corrected first-order spatial derivatives are used to calculate the second- order time derivatives of velocity, which, in turn, are used to calculate the corrections for the second-order pressure derivatives. The process as described is repeated, progressing through increasing orders of derivatives, until the desired accuracy is attained.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)
NASA Technical Reports Server (NTRS)
Dalton, Shelly D.; Daley, Philip C.
1988-01-01
As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
Advertising down, but marketing spending climbs; survey says focus is on targeting, not overviews.
Burns, J
1992-02-03
Hard economic times forced hospitals nationwide to cut their advertising budgets in 1991. But marketing expenditures hit an all-time high, as hospitals shifted the emphasis to high-technology marketing efforts that employ computer data bases and market research to seek and find potential business.
Computational AeroAcoustics for Fan Noise Prediction
NASA Technical Reports Server (NTRS)
Envia, Ed; Hixon, Ray; Dyson, Rodger; Huff, Dennis (Technical Monitor)
2002-01-01
An overview of the current state-of-the-art in computational aeroacoustics as applied to fan noise prediction at NASA Glenn is presented. Results from recent modeling efforts using three dimensional inviscid formulations in both frequency and time domains are summarized. In particular, the application of a frequency domain method, called LINFLUX, to the computation of rotor-stator interaction tone noise is reviewed and the influence of the background inviscid flow on the acoustic results is analyzed. It has been shown that the noise levels are very sensitive to the gradients of the mean flow near the surface and that the correct computation of these gradients for highly loaded airfoils is especially problematic using an inviscid formulation. The ongoing development of a finite difference time marching code that is based on a sixth order compact scheme is also reviewed. Preliminary results from the nonlinear computation of a gust-airfoil interaction model problem demonstrate the fidelity and accuracy of this approach. Spatial and temporal features of the code as well as its multi-block nature are discussed. Finally, latest results from an ongoing effort in the area of arbitrarily high order methods are reviewed and technical challenges associated with implementing correct high order boundary conditions are discussed and possible strategies for addressing these challenges ore outlined.
NASA Technical Reports Server (NTRS)
Warner, James E.; Zubair, Mohammad; Ranjan, Desh
2017-01-01
This work investigates novel approaches to probabilistic damage diagnosis that utilize surrogate modeling and high performance computing (HPC) to achieve substantial computational speedup. Motivated by Digital Twin, a structural health management (SHM) paradigm that integrates vehicle-specific characteristics with continual in-situ damage diagnosis and prognosis, the methods studied herein yield near real-time damage assessments that could enable monitoring of a vehicle's health while it is operating (i.e. online SHM). High-fidelity modeling and uncertainty quantification (UQ), both critical to Digital Twin, are incorporated using finite element method simulations and Bayesian inference, respectively. The crux of the proposed Bayesian diagnosis methods, however, is the reformulation of the numerical sampling algorithms (e.g. Markov chain Monte Carlo) used to generate the resulting probabilistic damage estimates. To this end, three distinct methods are demonstrated for rapid sampling that utilize surrogate modeling and exploit various degrees of parallelism for leveraging HPC. The accuracy and computational efficiency of the methods are compared on the problem of strain-based crack identification in thin plates. While each approach has inherent problem-specific strengths and weaknesses, all approaches are shown to provide accurate probabilistic damage diagnoses and several orders of magnitude computational speedup relative to a baseline Bayesian diagnosis implementation.
NASA Astrophysics Data System (ADS)
Guo, Jie; Zhu, Chang`an
2016-01-01
The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
Real-time Tsunami Inundation Prediction Using High Performance Computers
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2014-12-01
Recently off-shore tsunami observation stations based on cabled ocean bottom pressure gauges are actively being deployed especially in Japan. These cabled systems are designed to provide real-time tsunami data before tsunamis reach coastlines for disaster mitigation purposes. To receive real benefits of these observations, real-time analysis techniques to make an effective use of these data are necessary. A representative study was made by Tsushima et al. (2009) that proposed a method to provide instant tsunami source prediction based on achieving tsunami waveform data. As time passes, the prediction is improved by using updated waveform data. After a tsunami source is predicted, tsunami waveforms are synthesized from pre-computed tsunami Green functions of linear long wave equations. Tsushima et al. (2014) updated the method by combining the tsunami waveform inversion with an instant inversion of coseismic crustal deformation and improved the prediction accuracy and speed in the early stages. For disaster mitigation purposes, real-time predictions of tsunami inundation are also important. In this study, we discuss the possibility of real-time tsunami inundation predictions, which require faster-than-real-time tsunami inundation simulation in addition to instant tsunami source analysis. Although the computational amount is large to solve non-linear shallow water equations for inundation predictions, it has become executable through the recent developments of high performance computing technologies. We conducted parallel computations of tsunami inundation and achieved 6.0 TFLOPS by using 19,000 CPU cores. We employed a leap-frog finite difference method with nested staggered grids of which resolution range from 405 m to 5 m. The resolution ratio of each nested domain was 1/3. Total number of grid points were 13 million, and the time step was 0.1 seconds. Tsunami sources of 2011 Tohoku-oki earthquake were tested. The inundation prediction up to 2 hours after the earthquake occurs took about 2 minutes, which would be sufficient for a practical tsunami inundation predictions. In the presentation, the computational performance of our faster-than-real-time tsunami inundation model will be shown, and preferable tsunami wave source analysis for an accurate inundation prediction will also be discussed.
NASA Astrophysics Data System (ADS)
Heilmann, B. Z.; Vallenilla Ferrara, A. M.
2009-04-01
The constant growth of contaminated sites, the unsustainable use of natural resources, and, last but not least, the hydrological risk related to extreme meteorological events and increased climate variability are major environmental issues of today. Finding solutions for these complex problems requires an integrated cross-disciplinary approach, providing a unified basis for environmental science and engineering. In computer science, grid computing is emerging worldwide as a formidable tool allowing distributed computation and data management with administratively-distant resources. Utilizing these modern High Performance Computing (HPC) technologies, the GRIDA3 project bundles several applications from different fields of geoscience aiming to support decision making for reasonable and responsible land use and resource management. In this abstract we present a geophysical application called EIAGRID that uses grid computing facilities to perform real-time subsurface imaging by on-the-fly processing of seismic field data and fast optimization of the processing workflow. Even though, seismic reflection profiling has a broad application range spanning from shallow targets in a few meters depth to targets in a depth of several kilometers, it is primarily used by the hydrocarbon industry and hardly for environmental purposes. The complexity of data acquisition and processing poses severe problems for environmental and geotechnical engineering: Professional seismic processing software is expensive to buy and demands large experience from the user. In-field processing equipment needed for real-time data Quality Control (QC) and immediate optimization of the acquisition parameters is often not available for this kind of studies. As a result, the data quality will be suboptimal. In the worst case, a crucial parameter such as receiver spacing, maximum offset, or recording time turns out later to be inappropriate and the complete acquisition campaign has to be repeated. The EIAGRID portal provides an innovative solution to this problem combining state-of-the-art data processing methods and modern remote grid computing technology. In field-processing equipment is substituted by remote access to high performance grid computing facilities. The latter can be ubiquitously controlled by a user-friendly web-browser interface accessed from the field by any mobile computer using wireless data transmission technology such as UMTS (Universal Mobile Telecommunications System) or HSUPA/HSDPA (High-Speed Uplink/Downlink Packet Access). The complexity of data-manipulation and processing and thus also the time demanding user interaction is minimized by a data-driven, and highly automated velocity analysis and imaging approach based on the Common-Reflection-Surface (CRS) stack. Furthermore, the huge computing power provided by the grid deployment allows parallel testing of alternative processing sequences and parameter settings, a feature which considerably reduces the turn-around times. A shared data storage using georeferencing tools and data grid technology is under current development. It will allow to publish already accomplished projects, making results, processing workflows and parameter settings available in a transparent and reproducible way. Creating a unified database shared by all users will facilitate complex studies and enable the use of data-crossing techniques to incorporate results of other environmental applications hosted on the GRIDA3 portal.
Computational Modeling and Real-Time Control of Patient-Specific Laser Treatment of Cancer
Fuentes, D.; Oden, J. T.; Diller, K. R.; Hazle, J. D.; Elliott, A.; Shetty, A.; Stafford, R. J.
2014-01-01
An adaptive feedback control system is presented which employs a computational model of bioheat transfer in living tissue to guide, in real-time, laser treatments of prostate cancer monitored by magnetic resonance thermal imaging (MRTI). The system is built on what can be referred to as cyberinfrastructure - a complex structure of high-speed network, large-scale parallel computing devices, laser optics, imaging, visualizations, inverse-analysis algorithms, mesh generation, and control systems that guide laser therapy to optimally control the ablation of cancerous tissue. The computational system has been successfully tested on in-vivo, canine prostate. Over the course of an 18 minute laser induced thermal therapy (LITT) performed at M.D. Anderson Cancer Center (MDACC) in Houston, Texas, the computational models were calibrated to intra-operative real time thermal imaging treatment data and the calibrated models controlled the bioheat transfer to within 5°C of the predetermined treatment plan. The computational arena is in Austin, Texas and managed at the Institute for Computational Engineering and Sciences (ICES). The system is designed to control the bioheat transfer remotely while simultaneously providing real-time remote visualization of the on-going treatment. Post operative histology of the canine prostate reveal that the damage region was within the targeted 1.2cm diameter treatment objective. PMID:19148754
Computational modeling and real-time control of patient-specific laser treatment of cancer.
Fuentes, D; Oden, J T; Diller, K R; Hazle, J D; Elliott, A; Shetty, A; Stafford, R J
2009-04-01
An adaptive feedback control system is presented which employs a computational model of bioheat transfer in living tissue to guide, in real-time, laser treatments of prostate cancer monitored by magnetic resonance thermal imaging. The system is built on what can be referred to as cyberinfrastructure-a complex structure of high-speed network, large-scale parallel computing devices, laser optics, imaging, visualizations, inverse-analysis algorithms, mesh generation, and control systems that guide laser therapy to optimally control the ablation of cancerous tissue. The computational system has been successfully tested on in vivo, canine prostate. Over the course of an 18 min laser-induced thermal therapy performed at M.D. Anderson Cancer Center (MDACC) in Houston, Texas, the computational models were calibrated to intra-operative real-time thermal imaging treatment data and the calibrated models controlled the bioheat transfer to within 5 degrees C of the predetermined treatment plan. The computational arena is in Austin, Texas and managed at the Institute for Computational Engineering and Sciences (ICES). The system is designed to control the bioheat transfer remotely while simultaneously providing real-time remote visualization of the on-going treatment. Post-operative histology of the canine prostate reveal that the damage region was within the targeted 1.2 cm diameter treatment objective.
An M-estimator for reduced-rank system identification.
Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S; Vogelstein, Joshua T
2017-01-15
High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ 1 and ℓ 2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models.
An M-estimator for reduced-rank system identification
Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S.; Vogelstein, Joshua T.
2018-01-01
High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ1 and ℓ2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models. PMID:29391659
Numerical Simulation of a High Mach Number Jet Flow
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Turkel, Eli; Mankbadi, Reda R.
1993-01-01
The recent efforts to develop accurate numerical schemes for transition and turbulent flows are motivated, among other factors, by the need for accurate prediction of flow noise. The success of developing high speed civil transport plane (HSCT) is contingent upon our understanding and suppression of the jet exhaust noise. The radiated sound can be directly obtained by solving the full (time-dependent) compressible Navier-Stokes equations. However, this requires computational storage that is beyond currently available machines. This difficulty can be overcome by limiting the solution domain to the near field where the jet is nonlinear and then use acoustic analogy (e.g., Lighthill) to relate the far-field noise to the near-field sources. The later requires obtaining the time-dependent flow field. The other difficulty in aeroacoustics computations is that at high Reynolds numbers the turbulent flow has a large range of scales. Direct numerical simulations (DNS) cannot obtain all the scales of motion at high Reynolds number of technological interest. However, it is believed that the large scale structure is more efficient than the small-scale structure in radiating noise. Thus, one can model the small scales and calculate the acoustically active scales. The large scale structure in the noise-producing initial region of the jet can be viewed as a wavelike nature, the net radiated sound is the net cancellation after integration over space. As such, aeroacoustics computations are highly sensitive to errors in computing the sound sources. It is therefore essential to use a high-order numerical scheme to predict the flow field. The present paper presents the first step in a ongoing effort to predict jet noise. The emphasis here is in accurate prediction of the unsteady flow field. We solve the full time-dependent Navier-Stokes equations by a high order finite difference method. Time accurate spatial simulations of both plane and axisymmetric jet are presented. Jet Mach numbers of 1.5 and 2.1 are considered. Reynolds number in the simulations was about a million. Our numerical model is based on the 2-4 scheme by Gottlieb & Turkel. Bayliss et al. applied the 2-4 scheme in boundary layer computations. This scheme was also used by Ragab and Sheen to study the nonlinear development of supersonic instability waves in a mixing layer. In this study, we present two dimensional direct simulation results for both plane and axisymmetric jets. These results are compared with linear theory predictions. These computations were made for near nozzle exit region and velocity in spanwise/azimuthal direction was assumed to be zero.
Real-time WAMI streaming target tracking in fog
NASA Astrophysics Data System (ADS)
Chen, Yu; Blasch, Erik; Chen, Ning; Deng, Anna; Ling, Haibin; Chen, Genshe
2016-05-01
Real-time information fusion based on WAMI (Wide-Area Motion Imagery), FMV (Full Motion Video), and Text data is highly desired for many mission critical emergency or security applications. Cloud Computing has been considered promising to achieve big data integration from multi-modal sources. In many mission critical tasks, however, powerful Cloud technology cannot satisfy the tight latency tolerance as the servers are allocated far from the sensing platform, actually there is no guaranteed connection in the emergency situations. Therefore, data processing, information fusion, and decision making are required to be executed on-site (i.e., near the data collection). Fog Computing, a recently proposed extension and complement for Cloud Computing, enables computing on-site without outsourcing jobs to a remote Cloud. In this work, we have investigated the feasibility of processing streaming WAMI in the Fog for real-time, online, uninterrupted target tracking. Using a single target tracking algorithm, we studied the performance of a Fog Computing prototype. The experimental results are very encouraging that validated the effectiveness of our Fog approach to achieve real-time frame rates.
High-Density Liquid-State Machine Circuitry for Time-Series Forecasting.
Rosselló, Josep L; Alomar, Miquel L; Morro, Antoni; Oliver, Antoni; Canals, Vincent
2016-08-01
Spiking neural networks (SNN) are the last neural network generation that try to mimic the real behavior of biological neurons. Although most research in this area is done through software applications, it is in hardware implementations in which the intrinsic parallelism of these computing systems are more efficiently exploited. Liquid state machines (LSM) have arisen as a strategic technique to implement recurrent designs of SNN with a simple learning methodology. In this work, we show a new low-cost methodology to implement high-density LSM by using Boolean gates. The proposed method is based on the use of probabilistic computing concepts to reduce hardware requirements, thus considerably increasing the neuron count per chip. The result is a highly functional system that is applied to high-speed time series forecasting.
Fulcher, Ben D; Jones, Nick S
2017-11-22
Phenotype measurements frequently take the form of time series, but we currently lack a systematic method for relating these complex data streams to scientifically meaningful outcomes, such as relating the movement dynamics of organisms to their genotype or measurements of brain dynamics of a patient to their disease diagnosis. Previous work addressed this problem by comparing implementations of thousands of diverse scientific time-series analysis methods in an approach termed highly comparative time-series analysis. Here, we introduce hctsa, a software tool for applying this methodological approach to data. hctsa includes an architecture for computing over 7,700 time-series features and a suite of analysis and visualization algorithms to automatically select useful and interpretable time-series features for a given application. Using exemplar applications to high-throughput phenotyping experiments, we show how hctsa allows researchers to leverage decades of time-series research to quantify and understand informative structure in time-series data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Evaluation of a continuous-rotation, high-speed scanning protocol for micro-computed tomography.
Kerl, Hans Ulrich; Isaza, Cristina T; Boll, Hanne; Schambach, Sebastian J; Nolte, Ingo S; Groden, Christoph; Brockmann, Marc A
2011-01-01
Micro-computed tomography is used frequently in preclinical in vivo research. Limiting factors are radiation dose and long scan times. The purpose of the study was to compare a standard step-and-shoot to a continuous-rotation, high-speed scanning protocol. Micro-computed tomography of a lead grid phantom and a rat femur was performed using a step-and-shoot and a continuous-rotation protocol. Detail discriminability and image quality were assessed by 3 radiologists. The signal-to-noise ratio and the modulation transfer function were calculated, and volumetric analyses of the femur were performed. The radiation dose of the scan protocols was measured using thermoluminescence dosimeters. The 40-second continuous-rotation protocol allowed a detail discriminability comparable to the step-and-shoot protocol at significantly lower radiation doses. No marked differences in volumetric or qualitative analyses were observed. Continuous-rotation micro-computed tomography significantly reduces scanning time and radiation dose without relevantly reducing image quality compared with a normal step-and-shoot protocol.
Interactive physically-based sound simulation
NASA Astrophysics Data System (ADS)
Raghuvanshi, Nikunj
The realization of interactive, immersive virtual worlds requires the ability to present a realistic audio experience that convincingly compliments their visual rendering. Physical simulation is a natural way to achieve such realism, enabling deeply immersive virtual worlds. However, physically-based sound simulation is very computationally expensive owing to the high-frequency, transient oscillations underlying audible sounds. The increasing computational power of desktop computers has served to reduce the gap between required and available computation, and it has become possible to bridge this gap further by using a combination of algorithmic improvements that exploit the physical, as well as perceptual properties of audible sounds. My thesis is a step in this direction. My dissertation concentrates on developing real-time techniques for both sub-problems of sound simulation: synthesis and propagation. Sound synthesis is concerned with generating the sounds produced by objects due to elastic surface vibrations upon interaction with the environment, such as collisions. I present novel techniques that exploit human auditory perception to simulate scenes with hundreds of sounding objects undergoing impact and rolling in real time. Sound propagation is the complementary problem of modeling the high-order scattering and diffraction of sound in an environment as it travels from source to listener. I discuss my work on a novel numerical acoustic simulator (ARD) that is hundred times faster and consumes ten times less memory than a high-accuracy finite-difference technique, allowing acoustic simulations on previously-intractable spaces, such as a cathedral, on a desktop computer. Lastly, I present my work on interactive sound propagation that leverages my ARD simulator to render the acoustics of arbitrary static scenes for multiple moving sources and listener in real time, while accounting for scene-dependent effects such as low-pass filtering and smooth attenuation behind obstructions, reverberation, scattering from complex geometry and sound focusing. This is enabled by a novel compact representation that takes a thousand times less memory than a direct scheme, thus reducing memory footprints to fit within available main memory. To the best of my knowledge, this is the only technique and system in existence to demonstrate auralization of physical wave-based effects in real-time on large, complex 3D scenes.
Efficient coarse simulation of a growing avascular tumor
Kavousanakis, Michail E.; Liu, Ping; Boudouvis, Andreas G.; Lowengrub, John; Kevrekidis, Ioannis G.
2013-01-01
The subject of this work is the development and implementation of algorithms which accelerate the simulation of early stage tumor growth models. Among the different computational approaches used for the simulation of tumor progression, discrete stochastic models (e.g., cellular automata) have been widely used to describe processes occurring at the cell and subcell scales (e.g., cell-cell interactions and signaling processes). To describe macroscopic characteristics (e.g., morphology) of growing tumors, large numbers of interacting cells must be simulated. However, the high computational demands of stochastic models make the simulation of large-scale systems impractical. Alternatively, continuum models, which can describe behavior at the tumor scale, often rely on phenomenological assumptions in place of rigorous upscaling of microscopic models. This limits their predictive power. In this work, we circumvent the derivation of closed macroscopic equations for the growing cancer cell populations; instead, we construct, based on the so-called “equation-free” framework, a computational superstructure, which wraps around the individual-based cell-level simulator and accelerates the computations required for the study of the long-time behavior of systems involving many interacting cells. The microscopic model, e.g., a cellular automaton, which simulates the evolution of cancer cell populations, is executed for relatively short time intervals, at the end of which coarse-scale information is obtained. These coarse variables evolve on slower time scales than each individual cell in the population, enabling the application of forward projection schemes, which extrapolate their values at later times. This technique is referred to as coarse projective integration. Increasing the ratio of projection times to microscopic simulator execution times enhances the computational savings. Crucial accuracy issues arising for growing tumors with radial symmetry are addressed by applying the coarse projective integration scheme in a cotraveling (cogrowing) frame. As a proof of principle, we demonstrate that the application of this scheme yields highly accurate solutions, while preserving the computational savings of coarse projective integration. PMID:22587128
Davis, J P; Akella, S; Waddell, P H
2004-01-01
Having greater computational power on the desktop for processing taxa data sets has been a dream of biologists/statisticians involved in phylogenetics data analysis. Many existing algorithms have been highly optimized-one example being Felsenstein's PHYLIP code, written in C, for UPGMA and neighbor joining algorithms. However, the ability to process more than a few tens of taxa in a reasonable amount of time using conventional computers has not yielded a satisfactory speedup in data processing, making it difficult for phylogenetics practitioners to quickly explore data sets-such as might be done from a laptop computer. We discuss the application of custom computing techniques to phylogenetics. In particular, we apply this technology to speed up UPGMA algorithm execution by a factor of a hundred, against that of PHYLIP code running on the same PC. We report on these experiments and discuss how custom computing techniques can be used to not only accelerate phylogenetics algorithm performance on the desktop, but also on larger, high-performance computing engines, thus enabling the high-speed processing of data sets involving thousands of taxa.
2012-01-01
Background We have previously studied prospective associations between computer use and mental health symptoms in a selected young adult population. The purpose of this study was to investigate if high computer use is a prospective risk factor for developing mental health symptoms in a population-based sample of young adults. Methods The study group was a cohort of young adults (n = 4163), 20–24 years old, who responded to a questionnaire at baseline and 1-year follow-up. Exposure variables included time spent on computer use (CU) in general, email/chat use, computer gaming, CU without breaks, and CU at night causing lost sleep. Mental health outcomes included perceived stress, sleep disturbances, symptoms of depression, and reduced performance due to stress, depressed mood, or tiredness. Prevalence ratios (PRs) were calculated for prospective associations between exposure variables at baseline and mental health outcomes (new cases) at 1-year follow-up for the men and women separately. Results Both high and medium computer use compared to low computer use at baseline were associated with sleep disturbances in the men at follow-up. High email/chat use was negatively associated with perceived stress, but positively associated with reported sleep disturbances for the men. For the women, high email/chat use was (positively) associated with several mental health outcomes, while medium computer gaming was associated with symptoms of depression, and CU without breaks with most mental health outcomes. CU causing lost sleep was associated with mental health outcomes for both men and women. Conclusions Time spent on general computer use was prospectively associated with sleep disturbances and reduced performance for the men. For the women, using the computer without breaks was a risk factor for several mental health outcomes. Some associations were enhanced in interaction with mobile phone use. Using the computer at night and consequently losing sleep was associated with most mental health outcomes for both men and women. Further studies should focus on mechanisms relating information and communication technology (ICT) use to sleep disturbances. PMID:23088719
Thomée, Sara; Härenstam, Annika; Hagberg, Mats
2012-10-22
We have previously studied prospective associations between computer use and mental health symptoms in a selected young adult population. The purpose of this study was to investigate if high computer use is a prospective risk factor for developing mental health symptoms in a population-based sample of young adults. The study group was a cohort of young adults (n = 4163), 20-24 years old, who responded to a questionnaire at baseline and 1-year follow-up. Exposure variables included time spent on computer use (CU) in general, email/chat use, computer gaming, CU without breaks, and CU at night causing lost sleep. Mental health outcomes included perceived stress, sleep disturbances, symptoms of depression, and reduced performance due to stress, depressed mood, or tiredness. Prevalence ratios (PRs) were calculated for prospective associations between exposure variables at baseline and mental health outcomes (new cases) at 1-year follow-up for the men and women separately. Both high and medium computer use compared to low computer use at baseline were associated with sleep disturbances in the men at follow-up. High email/chat use was negatively associated with perceived stress, but positively associated with reported sleep disturbances for the men. For the women, high email/chat use was (positively) associated with several mental health outcomes, while medium computer gaming was associated with symptoms of depression, and CU without breaks with most mental health outcomes. CU causing lost sleep was associated with mental health outcomes for both men and women. Time spent on general computer use was prospectively associated with sleep disturbances and reduced performance for the men. For the women, using the computer without breaks was a risk factor for several mental health outcomes. Some associations were enhanced in interaction with mobile phone use. Using the computer at night and consequently losing sleep was associated with most mental health outcomes for both men and women. Further studies should focus on mechanisms relating information and communication technology (ICT) use to sleep disturbances.
NASA Astrophysics Data System (ADS)
Osorio-Murillo, C. A.; Over, M. W.; Frystacky, H.; Ames, D. P.; Rubin, Y.
2013-12-01
A new software application called MAD# has been coupled with the HTCondor high throughput computing system to aid scientists and educators with the characterization of spatial random fields and enable understanding the spatial distribution of parameters used in hydrogeologic and related modeling. MAD# is an open source desktop software application used to characterize spatial random fields using direct and indirect information through Bayesian inverse modeling technique called the Method of Anchored Distributions (MAD). MAD relates indirect information with a target spatial random field via a forward simulation model. MAD# executes inverse process running the forward model multiple times to transfer information from indirect information to the target variable. MAD# uses two parallelization profiles according to computational resources available: one computer with multiple cores and multiple computers - multiple cores through HTCondor. HTCondor is a system that manages a cluster of desktop computers for submits serial or parallel jobs using scheduling policies, resources monitoring, job queuing mechanism. This poster will show how MAD# reduces the time execution of the characterization of random fields using these two parallel approaches in different case studies. A test of the approach was conducted using 1D problem with 400 cells to characterize saturated conductivity, residual water content, and shape parameters of the Mualem-van Genuchten model in four materials via the HYDRUS model. The number of simulations evaluated in the inversion was 10 million. Using the one computer approach (eight cores) were evaluated 100,000 simulations in 12 hours (10 million - 1200 hours approximately). In the evaluation on HTCondor, 32 desktop computers (132 cores) were used, with a processing time of 60 hours non-continuous in five days. HTCondor reduced the processing time for uncertainty characterization by a factor of 20 (1200 hours reduced to 60 hours.)
Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card
NASA Astrophysics Data System (ADS)
Jiang, Jinpeng; Zhu, Peimin
2018-05-01
Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.
Seismic wavefield modeling based on time-domain symplectic and Fourier finite-difference method
NASA Astrophysics Data System (ADS)
Fang, Gang; Ba, Jing; Liu, Xin-xin; Zhu, Kun; Liu, Guo-Chang
2017-06-01
Seismic wavefield modeling is important for improving seismic data processing and interpretation. Calculations of wavefield propagation are sometimes not stable when forward modeling of seismic wave uses large time steps for long times. Based on the Hamiltonian expression of the acoustic wave equation, we propose a structure-preserving method for seismic wavefield modeling by applying the symplectic finite-difference method on time grids and the Fourier finite-difference method on space grids to solve the acoustic wave equation. The proposed method is called the symplectic Fourier finite-difference (symplectic FFD) method, and offers high computational accuracy and improves the computational stability. Using acoustic approximation, we extend the method to anisotropic media. We discuss the calculations in the symplectic FFD method for seismic wavefield modeling of isotropic and anisotropic media, and use the BP salt model and BP TTI model to test the proposed method. The numerical examples suggest that the proposed method can be used in seismic modeling of strongly variable velocities, offering high computational accuracy and low numerical dispersion. The symplectic FFD method overcomes the residual qSV wave of seismic modeling in anisotropic media and maintains the stability of the wavefield propagation for large time steps.
NASA Astrophysics Data System (ADS)
Xue, Bo; Mao, Bingjing; Chen, Xiaomei; Ni, Guoqiang
2010-11-01
This paper renders a configurable distributed high performance computing(HPC) framework for TDI-CCD imaging simulation. It uses strategy pattern to adapt multi-algorithms. Thus, this framework help to decrease the simulation time with low expense. Imaging simulation for TDI-CCD mounted on satellite contains four processes: 1) atmosphere leads degradation, 2) optical system leads degradation, 3) electronic system of TDI-CCD leads degradation and re-sampling process, 4) data integration. Process 1) to 3) utilize diversity data-intensity algorithms such as FFT, convolution and LaGrange Interpol etc., which requires powerful CPU. Even uses Intel Xeon X5550 processor, regular series process method takes more than 30 hours for a simulation whose result image size is 1500 * 1462. With literature study, there isn't any mature distributing HPC framework in this field. Here we developed a distribute computing framework for TDI-CCD imaging simulation, which is based on WCF[1], uses Client/Server (C/S) layer and invokes the free CPU resources in LAN. The server pushes the process 1) to 3) tasks to those free computing capacity. Ultimately we rendered the HPC in low cost. In the computing experiment with 4 symmetric nodes and 1 server , this framework reduced about 74% simulation time. Adding more asymmetric nodes to the computing network, the time decreased namely. In conclusion, this framework could provide unlimited computation capacity in condition that the network and task management server are affordable. And this is the brand new HPC solution for TDI-CCD imaging simulation and similar applications.
Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca
2016-10-15
The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
[Economic efficiency of computer monitoring of health].
Il'icheva, N P; Stazhadze, L L
2001-01-01
Presents the method of computer monitoring of health, based on utilization of modern information technologies in public health. The method helps organize preventive activities of an outpatient clinic at a high level and essentially decrease the time and money loss. Efficiency of such preventive measures, increased number of computer and Internet users suggests that such methods are promising and further studies in this field are needed.
Comparison of RCS prediction techniques, computations and measurements
NASA Astrophysics Data System (ADS)
Brand, M. G. E.; Vanewijk, L. J.; Klinker, F.; Schippers, H.
1992-07-01
Three calculation methods to predict radar cross sections (RCS) of three dimensional objects are evaluated by computing the radar cross sections of a generic wing inlet configuration. The following methods are applied: a three dimensional high frequency method, a three dimensional boundary element method, and a two dimensional finite difference time domain method. The results of the computations are compared with the data of measurements.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Alpha Control - A new Concept in SPM Control
NASA Astrophysics Data System (ADS)
Spizig, P.; Sanchen, D.; Volswinkler, G.; Ibach, W.; Koenen, J.
2006-03-01
Controlling modern Scanning Probe Microscopes demands highly sophisticated electronics. While flexibility and powerful computing power is of great importance in facilitating the variety of measurement modes, extremely low noise is also a necessity. Accordingly, modern SPM Controller designs are based on digital electronics to overcome the drawbacks of analog designs. While todays SPM controllers are based on DSPs or Microprocessors and often still incorporate analog parts, we are now introducing a completely new approach: Using a Field Programmable Gate Array (FPGA) to implement the digital control tasks allows unrivalled data processing speed by computing all tasks in parallel within a single chip. Time consuming task switching between data acquisition, digital filtering, scanning and the computing of feedback signals can be completely avoided. Together with a star topology to avoid any bus limitations in accessing the variety of ADCs and DACs, this design guarantees for the first time an entirely deterministic timing capability in the nanosecond regime for all tasks. This becomes especially useful for any external experiments which must be synchronized with the scan or for high speed scans that require not only closed loop control of the scanner, but also dynamic correction of the scan movement. Delicate samples additionally benefit from extremely high sample rates, allowing highly resolved signals and low noise levels.
Graphical processors for HEP trigger systems
NASA Astrophysics Data System (ADS)
Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.
2017-02-01
General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
A high-order spatial filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-04-01
A high-order spatial filter is developed for the spectral-element-method dynamical core on the cubed-sphere grid which employs the Gauss-Lobatto Lagrange interpolating polynomials (GLLIP) as orthogonal basis functions. The filter equation is the high-order Helmholtz equation which corresponds to the implicit time-differencing of a diffusion equation employing the high-order Laplacian. The Laplacian operator is discretized within a cell which is a building block of the cubed sphere grid and consists of the Gauss-Lobatto grid. When discretizing a high-order Laplacian, due to the requirement of C0 continuity along the cell boundaries the grid-points in neighboring cells should be used for the target cell: The number of neighboring cells is nearly quadratically proportional to the filter order. Discrete Helmholtz equation yields a huge-sized and highly sparse matrix equation whose size is N*N with N the number of total grid points on the globe. The number of nonzero entries is also almost in quadratic proportion to the filter order. Filtering is accomplished by solving the huge-matrix equation. While requiring a significant computing time, the solution of global matrix provides the filtered field free of discontinuity along the cell boundaries. To achieve the computational efficiency and the accuracy at the same time, the solution of the matrix equation was obtained by only accounting for the finite number of adjacent cells. This is called as a local-domain filter. It was shown that to remove the numerical noise near the grid-scale, inclusion of 5*5 cells for the local-domain filter was found sufficient, giving the same accuracy as that obtained by global domain solution while reducing the computing time to a considerably lower level. The high-order filter was evaluated using the standard test cases including the baroclinic instability of the zonal flow. Results indicated that the filter performs better on the removal of grid-scale numerical noises than the explicit high-order viscosity. It was also presented that the filter can be easily implemented on the distributed-memory parallel computers with a desirable scalability.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Neural Network Training by Integration of Adjoint Systems of Equations Forward in Time
NASA Technical Reports Server (NTRS)
Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)
1999-01-01
A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically. it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved. but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. Tbc trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
Neural network training by integration of adjoint systems of equations forward in time
NASA Technical Reports Server (NTRS)
Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)
1992-01-01
A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically, it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved, but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. The trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
A lightweight distributed framework for computational offloading in mobile cloud computing.
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC.
A Lightweight Distributed Framework for Computational Offloading in Mobile Cloud Computing
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC. PMID:25127245
Biocellion: accelerating computer simulation of multicellular biological system models
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-01-01
Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
Real-time polarization imaging algorithm for camera-based polarization navigation sensors.
Lu, Hao; Zhao, Kaichun; You, Zheng; Huang, Kaoli
2017-04-10
Biologically inspired polarization navigation is a promising approach due to its autonomous nature, high precision, and robustness. Many researchers have built point source-based and camera-based polarization navigation prototypes in recent years. Camera-based prototypes can benefit from their high spatial resolution but incur a heavy computation load. The pattern recognition algorithm in most polarization imaging algorithms involves several nonlinear calculations that impose a significant computation burden. In this paper, the polarization imaging and pattern recognition algorithms are optimized through reduction to several linear calculations by exploiting the orthogonality of the Stokes parameters without affecting precision according to the features of the solar meridian and the patterns of the polarized skylight. The algorithm contains a pattern recognition algorithm with a Hough transform as well as orientation measurement algorithms. The algorithm was loaded and run on a digital signal processing system to test its computational complexity. The test showed that the running time decreased to several tens of milliseconds from several thousand milliseconds. Through simulations and experiments, it was found that the algorithm can measure orientation without reducing precision. It can hence satisfy the practical demands of low computational load and high precision for use in embedded systems.
Large Eddy Simulation of "turbulent-like" flow in intracranial aneurysms
NASA Astrophysics Data System (ADS)
Khan, Muhammad Owais; Chnafa, Christophe; Steinman, David A.; Mendez, Simon; Nicoud, Franck
2016-11-01
Hemodynamic forces are thought to contribute to pathogenesis and rupture of intracranial aneurysms (IA). Recent high-resolution patient-specific computational fluid dynamics (CFD) simulations have highlighted the presence of "turbulent-like" flow features, characterized by transient high-frequency flow instabilities. In-vitro studies have shown that such "turbulent-like" flows can lead to lack of endothelial cell orientation and cell depletion, and thus, may also have relevance to IA rupture risk assessment. From a modelling perspective, previous studies have relied on DNS to resolve the small-scale structures in these flows. While accurate, DNS is clinically infeasible due to high computational cost and long simulation times. In this study, we present the applicability of LES for IAs using a LES/blood flow dedicated solver (YALES2BIO) and compare against respective DNS. As a qualitative analysis, we compute time-averaged WSS and OSI maps, as well as, novel frequency-based WSS indices. As a quantitative analysis, we show the differences in POD eigenspectra for LES vs. DNS and wavelet analysis of intra-saccular velocity traces. Differences in two SGS models (i.e. Dynamic Smagorinsky vs. Sigma) are also compared against DNS, and computational gains of LES are discussed.
Liu, Mali; Lu, Chihao; Li, Haifeng; Liu, Xu
2018-02-19
We propose a bifocal computational near eye light field display (bifocal computational display) and structure parameters determination scheme (SPDS) for bifocal computational display that achieves greater depth of field (DOF), high resolution, accommodation and compact form factor. Using a liquid varifocal lens, two single-focal computational light fields are superimposed to reconstruct a virtual object's light field by time multiplex and avoid the limitation on high refresh rate. By minimizing the deviation between reconstructed light field and original light field, we propose a determination framework to determine the structure parameters of bifocal computational light field display. When applied to different objective to SPDS, it can achieve high average resolution or uniform resolution display over scene depth range. To analyze the advantages and limitation of our proposed method, we have conducted simulations and constructed a simple prototype which comprises a liquid varifocal lens, dual-layer LCDs and a uniform backlight. The results of simulation and experiments with our method show that the proposed system can achieve expected performance well. Owing to the excellent performance of our system, we motivate bifocal computational display and SPDS to contribute to a daily-use and commercial virtual reality display.
FPGA-based real-time phase measuring profilometry algorithm design and implementation
NASA Astrophysics Data System (ADS)
Zhan, Guomin; Tang, Hongwei; Zhong, Kai; Li, Zhongwei; Shi, Yusheng
2016-11-01
Phase measuring profilometry (PMP) has been widely used in many fields, like Computer Aided Verification (CAV), Flexible Manufacturing System (FMS) et al. High frame-rate (HFR) real-time vision-based feedback control will be a common demands in near future. However, the instruction time delay in the computer caused by numerous repetitive operations greatly limit the efficiency of data processing. FPGA has the advantages of pipeline architecture and parallel execution, and it fit for handling PMP algorithm. In this paper, we design a fully pipelined hardware architecture for PMP. The functions of hardware architecture includes rectification, phase calculation, phase shifting, and stereo matching. The experiment verified the performance of this method, and the factors that may influence the computation accuracy was analyzed.
Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke
2015-01-01
Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842
Fog computing job scheduling optimization based on bees swarm
NASA Astrophysics Data System (ADS)
Bitam, Salim; Zeadally, Sherali; Mellouk, Abdelhamid
2018-04-01
Fog computing is a new computing architecture, composed of a set of near-user edge devices called fog nodes, which collaborate together in order to perform computational services such as running applications, storing an important amount of data, and transmitting messages. Fog computing extends cloud computing by deploying digital resources at the premise of mobile users. In this new paradigm, management and operating functions, such as job scheduling aim at providing high-performance, cost-effective services requested by mobile users and executed by fog nodes. We propose a new bio-inspired optimization approach called Bees Life Algorithm (BLA) aimed at addressing the job scheduling problem in the fog computing environment. Our proposed approach is based on the optimized distribution of a set of tasks among all the fog computing nodes. The objective is to find an optimal tradeoff between CPU execution time and allocated memory required by fog computing services established by mobile users. Our empirical performance evaluation results demonstrate that the proposal outperforms the traditional particle swarm optimization and genetic algorithm in terms of CPU execution time and allocated memory.
NASA Technical Reports Server (NTRS)
1994-01-01
In the mid-1980s, Kinetic Systems and Langley Research Center determined that high speed CAMAC (Computer Automated Measurement and Control) data acquisition systems could significantly improve Langley's ARTS (Advanced Real Time Simulation) system. The ARTS system supports flight simulation R&D, and the CAMAC equipment allowed 32 high performance simulators to be controlled by centrally located host computers. This technology broadened Kinetic Systems' capabilities and led to several commercial applications. One of them is General Atomics' fusion research program. Kinetic Systems equipment allows tokamak data to be acquired four to 15 times more rapidly. Ford Motor company uses the same technology to control and monitor transmission testing facilities.
High-power graphic computers for visual simulation: a real-time--rendering revolution
NASA Technical Reports Server (NTRS)
Kaiser, M. K.
1996-01-01
Advances in high-end graphics computers in the past decade have made it possible to render visual scenes of incredible complexity and realism in real time. These new capabilities make it possible to manipulate and investigate the interactions of observers with their visual world in ways once only dreamed of. This paper reviews how these developments have affected two preexisting domains of behavioral research (flight simulation and motion perception) and have created a new domain (virtual environment research) which provides tools and challenges for the perceptual psychologist. Finally, the current limitations of these technologies are considered, with an eye toward how perceptual psychologist might shape future developments.
Computation of high Reynolds number internal/external flows
NASA Technical Reports Server (NTRS)
Cline, M. C.; Wilmoth, R. G.
1981-01-01
A general, user oriented computer program, called VNAP2, has been developed to calculate high Reynolds number, internal/external flows. VNAP2 solves the two-dimensional, time-dependent Navier-Stokes equations. The turbulence is modeled with either a mixing-length, a one transport equation, or a two transport equation model. Interior grid points are computed using the explicit MacCormack scheme with special procedures to speed up the calculation in the fine grid. All boundary conditions are calculated using a reference plane characteristic scheme with the viscous terms treated as source terms. Several internal, and internal/external flow calculations are presented.
Computation of high Reynolds number internal/external flows
NASA Technical Reports Server (NTRS)
Cline, M. C.; Wilmoth, R. G.
1981-01-01
A general, user oriented computer program, called VNAP2, was developed to calculate high Reynolds number, internal/ external flows. The VNAP2 program solves the two dimensional, time dependent Navier-Stokes equations. The turbulence is modeled with either a mixing-length, a one transport equation, or a two transport equation model. Interior grid points are computed using the explicit MacCormack Scheme with special procedures to speed up the calculation in the fine grid. All boundary conditions are calculated using a reference plane characteristic scheme with the viscous terms treated as source terms. Several internal, external, and internal/external flow calculations are presented.
Computation of high Reynolds number internal/external flows
NASA Technical Reports Server (NTRS)
Cline, M. C.; Wilmoth, R. G.
1981-01-01
A general, user oriented computer program, called VNAF2, developed to calculate high Reynolds number internal/external flows is described. The program solves the two dimensional, time dependent Navier-Stokes equations. Turbulence is modeled with either a mixing length, a one transport equation, or a two transport equation model. Interior grid points are computed using the explicit MacCormack scheme with special procedures to speed up the calculation in the fine grid. All boundary conditions are calculated using a reference plane characteristic scheme with the viscous terms treated as source terms. Several internal, external, and internal/external flow calculations are presented.
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
NASA Astrophysics Data System (ADS)
Yang, Yiqun; Urban, Matthew W.; McGough, Robert J.
2018-05-01
Shear wave calculations induced by an acoustic radiation force are very time-consuming on desktop computers, and high-performance graphics processing units (GPUs) achieve dramatic reductions in the computation time for these simulations. The acoustic radiation force is calculated using the fast near field method and the angular spectrum approach, and then the shear waves are calculated in parallel with Green’s functions on a GPU. This combination enables rapid evaluation of shear waves for push beams with different spatial samplings and for apertures with different f/#. Relative to shear wave simulations that evaluate the same algorithm on an Intel i7 desktop computer, a high performance nVidia GPU reduces the time required for these calculations by a factor of 45 and 700 when applied to elastic and viscoelastic shear wave simulation models, respectively. These GPU-accelerated simulations also compared to measurements in different viscoelastic phantoms, and the results are similar. For parametric evaluations and for comparisons with measured shear wave data, shear wave simulations with the Green’s function approach are ideally suited for high-performance GPUs.
Real-time Adaptive EEG Source Separation using Online Recursive Independent Component Analysis
Hsu, Sheng-Hsiou; Mullen, Tim; Jung, Tzyy-Ping; Cauwenberghs, Gert
2016-01-01
Independent Component Analysis (ICA) has been widely applied to electroencephalographic (EEG) biosignal processing and brain-computer interfaces. The practical use of ICA, however, is limited by its computational complexity, data requirements for convergence, and assumption of data stationarity, especially for high-density data. Here we study and validate an optimized online recursive ICA algorithm (ORICA) with online recursive least squares (RLS) whitening for blind source separation of high-density EEG data, which offers instantaneous incremental convergence upon presentation of new data. Empirical results of this study demonstrate the algorithm's: (a) suitability for accurate and efficient source identification in high-density (64-channel) realistically-simulated EEG data; (b) capability to detect and adapt to non-stationarity in 64-ch simulated EEG data; and (c) utility for rapidly extracting principal brain and artifact sources in real 61-channel EEG data recorded by a dry and wearable EEG system in a cognitive experiment. ORICA was implemented as functions in BCILAB and EEGLAB and was integrated in an open-source Real-time EEG Source-mapping Toolbox (REST), supporting applications in ICA-based online artifact rejection, feature extraction for real-time biosignal monitoring in clinical environments, and adaptable classifications in brain-computer interfaces. PMID:26685257
Job Management and Task Bundling
NASA Astrophysics Data System (ADS)
Berkowitz, Evan; Jansen, Gustav R.; McElvain, Kenneth; Walker-Loud, André
2018-03-01
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
Multiscaling properties of coastal waters particle size distribution from LISST in situ measurements
NASA Astrophysics Data System (ADS)
Pannimpullath Remanan, R.; Schmitt, F. G.; Loisel, H.; Mériaux, X.
2013-12-01
An eulerian high frequency sampling of particle size distribution (PSD) is performed during 5 tidal cycles (65 hours) in a coastal environment of the eastern English Channel at 1 Hz. The particle data are recorded using a LISST-100x type C (Laser In Situ Scattering and Transmissometry, Sequoia Scientific), recording volume concentrations of particles having diameters ranging from 2.5 to 500 mu in 32 size classes in logarithmic scale. This enables the estimation at each time step (every second) of the probability density function of particle sizes. At every time step, the pdf of PSD is hyperbolic. We can thus estimate PSD slope time series. Power spectral analysis shows that the mean diameter of the suspended particles is scaling at high frequencies (from 1s to 1000s). The scaling properties of particle sizes is studied by computing the moment function, from the pdf of the size distribution. Moment functions at many different time scales (from 1s to 1000 s) are computed and their scaling properties considered. The Shannon entropy at each time scale is also estimated and is related to other parameters. The multiscaling properties of the turbidity (coefficient cp computed from the LISST) are also consider on the same time scales, using Empirical Mode Decomposition.
High accurate time system of the Low Latitude Meridian Circle.
NASA Astrophysics Data System (ADS)
Yang, Jing; Wang, Feng; Li, Zhiming
In order to obtain the high accurate time signal for the Low Latitude Meridian Circle (LLMC), a new GPS accurate time system is developed which include GPS, 1 MC frequency source and self-made clock system. The second signal of GPS is synchronously used in the clock system and information can be collected by a computer automatically. The difficulty of the cancellation of the time keeper can be overcomed by using this system.
NASA Astrophysics Data System (ADS)
Nelson, Mathew
In today's age of exponential change and technological advancement, awareness of any gender gap in technology and computer science-related fields is crucial, but further research must be done in an effort to better understand the complex interacting factors contributing to the gender gap. This study utilized a survey to investigate specific gender differences relating to computing self-efficacy, computer usage, and environmental factors of exposure, personal interests, and parental influence that impact gender differences of high school students within a one-to-one computing environment in South Dakota. The population who completed the One-to-One High School Computing Survey for this study consisted of South Dakota high school seniors who had been involved in a one-to-one computing environment for two or more years. The data from the survey were analyzed using descriptive and inferential statistics for the determined variables. From the review of literature and data analysis several conclusions were drawn from the findings. Among them are that overall, there was very little difference in perceived computing self-efficacy and computing anxiety between male and female students within the one-to-one computing initiative. The study supported the current research that males and females utilized computers similarly, but males spent more time using their computers to play online games. Early exposure to computers, or the age at which the student was first exposed to a computer, and the number of computers present in the home (computer ownership) impacted computing self-efficacy. The results also indicated parental encouragement to work with computers also contributed positively to both male and female students' computing self-efficacy. Finally the study also found that both mothers and fathers encouraged their male children more than their female children to work with computing and pursue careers in computing science fields.
NASA Astrophysics Data System (ADS)
Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.
2003-08-01
In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
NASA Astrophysics Data System (ADS)
Gizon, Laurent; Barucq, Hélène; Duruflé, Marc; Hanson, Chris S.; Leguèbe, Michael; Birch, Aaron C.; Chabassier, Juliette; Fournier, Damien; Hohage, Thorsten; Papini, Emanuele
2017-04-01
Context. Local helioseismology has so far relied on semi-analytical methods to compute the spatial sensitivity of wave travel times to perturbations in the solar interior. These methods are cumbersome and lack flexibility. Aims: Here we propose a convenient framework for numerically solving the forward problem of time-distance helioseismology in the frequency domain. The fundamental quantity to be computed is the cross-covariance of the seismic wavefield. Methods: We choose sources of wave excitation that enable us to relate the cross-covariance of the oscillations to the Green's function in a straightforward manner. We illustrate the method by considering the 3D acoustic wave equation in an axisymmetric reference solar model, ignoring the effects of gravity on the waves. The symmetry of the background model around the rotation axis implies that the Green's function can be written as a sum of longitudinal Fourier modes, leading to a set of independent 2D problems. We use a high-order finite-element method to solve the 2D wave equation in frequency space. The computation is embarrassingly parallel, with each frequency and each azimuthal order solved independently on a computer cluster. Results: We compute travel-time sensitivity kernels in spherical geometry for flows, sound speed, and density perturbations under the first Born approximation. Convergence tests show that travel times can be computed with a numerical precision better than one millisecond, as required by the most precise travel-time measurements. Conclusions: The method presented here is computationally efficient and will be used to interpret travel-time measurements in order to infer, e.g., the large-scale meridional flow in the solar convection zone. It allows the implementation of (full-waveform) iterative inversions, whereby the axisymmetric background model is updated at each iteration.
On a fast calculation of structure factors at a subatomic resolution.
Afonine, P V; Urzhumtsev, A
2004-01-01
In the last decade, the progress of protein crystallography allowed several protein structures to be solved at a resolution higher than 0.9 A. Such studies provide researchers with important new information reflecting very fine structural details. The signal from these details is very weak with respect to that corresponding to the whole structure. Its analysis requires high-quality data, which previously were available only for crystals of small molecules, and a high accuracy of calculations. The calculation of structure factors using direct formulae, traditional for 'small-molecule' crystallography, allows a relatively simple accuracy control. For macromolecular crystals, diffraction data sets at a subatomic resolution contain hundreds of thousands of reflections, and the number of parameters used to describe the corresponding models may reach the same order. Therefore, the direct way of calculating structure factors becomes very time expensive when applied to large molecules. These problems of high accuracy and computational efficiency require a re-examination of computer tools and algorithms. The calculation of model structure factors through an intermediate generation of an electron density [Sayre (1951). Acta Cryst. 4, 362-367; Ten Eyck (1977). Acta Cryst. A33, 486-492] may be much more computationally efficient, but contains some parameters (grid step, 'effective' atom radii etc.) whose influence on the accuracy of the calculation is not straightforward. At the same time, the choice of parameters within safety margins that largely ensure a sufficient accuracy may result in a significant loss of the CPU time, making it close to the time for the direct-formulae calculations. The impact of the different parameters on the computer efficiency of structure-factor calculation is studied. It is shown that an appropriate choice of these parameters allows the structure factors to be obtained with a high accuracy and in a significantly shorter time than that required when using the direct formulae. Practical algorithms for the optimal choice of the parameters are suggested.
NASA Astrophysics Data System (ADS)
Cieszewski, Radoslaw; Linczuk, Maciej
2016-09-01
The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.
Sisson, Susan B; Broyles, Stephanie T; Baker, Birgitta L; Katzmarzyk, Peter T
2011-09-01
The purposes were 1) to determine if different leisure-time sedentary behaviors (LTSB), such as TV/video/video game viewing/playing (TV), reading for pleasure (reading), and nonschool computer usage, were associated with childhood overweight status, and 2) to assess the social-ecological correlates of LTSB. The analytic sample was 33,117 (16,952 boys and 16,165 girls) participants from the 2003 National Survey of Children's Health. The cut-point for excessive TV and nonschool computer usage was ≥ 2 hr/day. High quantities of daily reading for pleasure were classified as ≥ 31 min/day. Weighted descriptive characteristics were calculated on the sample (means ± SE or frequency). Logistic regression models were used to determine if the LTSB were associated with overweight status and to examine social-ecological correlates. Over 35% of the sample was overweight. Odds of being overweight were higher in the 2 to 3 hr/day (OR: 1.48, 95% CI: 1.24, 1.76) and ≥ 4 hr/day (OR: 1.52, 95% CI: 1.22, 1.91) daily TV groups compared with none. Reading and nonschool computer usage was not associated with being overweight. TV was associated with overweight classification; however, nonschool computer usage and reading were not. Several individual, family, and community correlates were associated with high volumes of daily TV viewing.
Computer versus paper--does it make any difference in test performance?
Karay, Yassin; Schauber, Stefan K; Stosch, Christoph; Schüttpelz-Brauns, Katrin
2015-01-01
CONSTRUCT: In this study, we examine the differences in test performance between the paper-based and the computer-based version of the Berlin formative Progress Test. In this context it is the first study that allows controlling for students' prior performance. Computer-based tests make possible a more efficient examination procedure for test administration and review. Although university staff will benefit largely from computer-based tests, the question arises if computer-based tests influence students' test performance. A total of 266 German students from the 9th and 10th semester of medicine (comparable with the 4th-year North American medical school schedule) participated in the study (paper = 132, computer = 134). The allocation of the test format was conducted as a randomized matched-pair design in which students were first sorted according to their prior test results. The organizational procedure, the examination conditions, the room, and seating arrangements, as well as the order of questions and answers, were identical in both groups. The sociodemographic variables and pretest scores of both groups were comparable. The test results from the paper and computer versions did not differ. The groups remained within the allotted time, but students using the computer version (particularly the high performers) needed significantly less time to complete the test. In addition, we found significant differences in guessing behavior. Low performers using the computer version guess significantly more than low-performing students in the paper-pencil version. Participants in computer-based tests are not at a disadvantage in terms of their test results. The computer-based test required less processing time. The reason for the longer processing time when using the paper-pencil version might be due to the time needed to write the answer down, controlling for transferring the answer correctly. It is still not known why students using the computer version (particularly low-performing students) guess at a higher rate. Further studies are necessary to understand this finding.
NASA Astrophysics Data System (ADS)
Min, M.
2017-10-01
Context. Opacities of molecules in exoplanet atmospheres rely on increasingly detailed line-lists for these molecules. The line lists available today contain for many species up to several billions of lines. Computation of the spectral line profile created by pressure and temperature broadening, the Voigt profile, of all of these lines is becoming a computational challenge. Aims: We aim to create a method to compute the Voigt profile in a way that automatically focusses the computation time into the strongest lines, while still maintaining the continuum contribution of the high number of weaker lines. Methods: Here, we outline a statistical line sampling technique that samples the Voigt profile quickly and with high accuracy. The number of samples is adjusted to the strength of the line and the local spectral line density. This automatically provides high accuracy line shapes for strong lines or lines that are spectrally isolated. The line sampling technique automatically preserves the integrated line opacity for all lines, thereby also providing the continuum opacity created by the large number of weak lines at very low computational cost. Results: The line sampling technique is tested for accuracy when computing line spectra and correlated-k tables. Extremely fast computations ( 3.5 × 105 lines per second per core on a standard current day desktop computer) with high accuracy (≤1% almost everywhere) are obtained. A detailed recipe on how to perform the computations is given.
Decoherence in adiabatic quantum computation
NASA Astrophysics Data System (ADS)
Albash, Tameem; Lidar, Daniel A.
2015-06-01
Recent experiments with increasingly larger numbers of qubits have sparked renewed interest in adiabatic quantum computation, and in particular quantum annealing. A central question that is repeatedly asked is whether quantum features of the evolution can survive over the long time scales used for quantum annealing relative to standard measures of the decoherence time. We reconsider the role of decoherence in adiabatic quantum computation and quantum annealing using the adiabatic quantum master-equation formalism. We restrict ourselves to the weak-coupling and singular-coupling limits, which correspond to decoherence in the energy eigenbasis and in the computational basis, respectively. We demonstrate that decoherence in the instantaneous energy eigenbasis does not necessarily detrimentally affect adiabatic quantum computation, and in particular that a short single-qubit T2 time need not imply adverse consequences for the success of the quantum adiabatic algorithm. We further demonstrate that boundary cancellation methods, designed to improve the fidelity of adiabatic quantum computing in the closed-system setting, remain beneficial in the open-system setting. To address the high computational cost of master-equation simulations, we also demonstrate that a quantum Monte Carlo algorithm that explicitly accounts for a thermal bosonic bath can be used to interpolate between classical and quantum annealing. Our study highlights and clarifies the significantly different role played by decoherence in the adiabatic and circuit models of quantum computing.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
Designing for Learner Engagement with Computer Based Testing
ERIC Educational Resources Information Center
Walker, Richard; Handley, Zoe
2016-01-01
The issues influencing student engagement with high-stakes computer-based exams were investigated, drawing on feedback from two cohorts of international MA Education students encountering this assessment method for the first time. Qualitative data from surveys and focus groups on the students' examination experience were analysed, leading to the…
Microfocus computed tomography in medicine
NASA Astrophysics Data System (ADS)
Obodovskiy, A. V.
2018-02-01
Recent advances in the field of high-frequency power schemes for X-ray devices allow the creation of high-resolution instruments. At the department of electronic devices and Equipment of the St. Petersburg State Electrotechnical University, a model of a microfocus computer tomograph was developed. Used equipment allows to receive projection data with an increase up to 100 times. A distinctive feature of the device is the possibility of implementing various schemes for obtaining projection data.
Ting, Samuel T; Ahmad, Rizwan; Jin, Ning; Craft, Jason; Serafim da Silveira, Juliana; Xue, Hui; Simonetti, Orlando P
2017-04-01
Sparsity-promoting regularizers can enable stable recovery of highly undersampled magnetic resonance imaging (MRI), promising to improve the clinical utility of challenging applications. However, lengthy computation time limits the clinical use of these methods, especially for dynamic MRI with its large corpus of spatiotemporal data. Here, we present a holistic framework that utilizes the balanced sparse model for compressive sensing and parallel computing to reduce the computation time of cardiac MRI recovery methods. We propose a fast, iterative soft-thresholding method to solve the resulting ℓ1-regularized least squares problem. In addition, our approach utilizes a parallel computing environment that is fully integrated with the MRI acquisition software. The methodology is applied to two formulations of the multichannel MRI problem: image-based recovery and k-space-based recovery. Using measured MRI data, we show that, for a 224 × 144 image series with 48 frames, the proposed k-space-based approach achieves a mean reconstruction time of 2.35 min, a 24-fold improvement compared a reconstruction time of 55.5 min for the nonlinear conjugate gradient method, and the proposed image-based approach achieves a mean reconstruction time of 13.8 s. Our approach can be utilized to achieve fast reconstruction of large MRI datasets, thereby increasing the clinical utility of reconstruction techniques based on compressed sensing. Magn Reson Med 77:1505-1515, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
High Performance Computing Modeling Advances Accelerator Science for High-Energy Physics
Amundson, James; Macridin, Alexandru; Spentzouris, Panagiotis
2014-07-28
The development and optimization of particle accelerators are essential for advancing our understanding of the properties of matter, energy, space, and time. Particle accelerators are complex devices whose behavior involves many physical effects on multiple scales. Therefore, advanced computational tools utilizing high-performance computing are essential for accurately modeling them. In the past decade, the US Department of Energy's SciDAC program has produced accelerator-modeling tools that have been employed to tackle some of the most difficult accelerator science problems. The authors discuss the Synergia framework and its applications to high-intensity particle accelerator physics. Synergia is an accelerator simulation package capable ofmore » handling the entire spectrum of beam dynamics simulations. Our authors present Synergia's design principles and its performance on HPC platforms.« less
NASA Astrophysics Data System (ADS)
DiSalvo, Elizabeth Betsy
The implementation of a learning environment for young African American males, called the Glitch Game Testers, was launched in 2009. The development of this program was based on formative work that looked at the contrasting use of digital games between young African American males and individuals who chose to become computer science majors. Through analysis of cultural values and digital game play practices, the program was designed to intertwine authentic game development practices and computer science learning. The resulting program employed 25 African American male high school students to test pre-release digital games full-time in the summer and part-time in the school year, with an hour of each day dedicated to learning introductory computer science. Outcomes for persisting in computer science education are remarkable; of the 16 participants who had graduated from high school as of 2012, 12 have gone on to school in computing-related majors. These outcomes, and the participants' enthusiasm for engaging in computing, are in sharp contrast to the crisis in African American male education and learning motivation. The research presented in this dissertation discusses the formative research that shaped the design of Glitch, the evaluation of the implementation of Glitch, and a theoretical investigation of the way in which participants navigated conflicting motivations in learning environments.
Design and Implementation of High-Performance GIS Dynamic Objects Rendering Engine
NASA Astrophysics Data System (ADS)
Zhong, Y.; Wang, S.; Li, R.; Yun, W.; Song, G.
2017-12-01
Spatio-temporal dynamic visualization is more vivid than static visualization. It important to use dynamic visualization techniques to reveal the variation process and trend vividly and comprehensively for the geographical phenomenon. To deal with challenges caused by dynamic visualization of both 2D and 3D spatial dynamic targets, especially for different spatial data types require high-performance GIS dynamic objects rendering engine. The main approach for improving the rendering engine with vast dynamic targets relies on key technologies of high-performance GIS, including memory computing, parallel computing, GPU computing and high-performance algorisms. In this study, high-performance GIS dynamic objects rendering engine is designed and implemented for solving the problem based on hybrid accelerative techniques. The high-performance GIS rendering engine contains GPU computing, OpenGL technology, and high-performance algorism with the advantage of 64-bit memory computing. It processes 2D, 3D dynamic target data efficiently and runs smoothly with vast dynamic target data. The prototype system of high-performance GIS dynamic objects rendering engine is developed based SuperMap GIS iObjects. The experiments are designed for large-scale spatial data visualization, the results showed that the high-performance GIS dynamic objects rendering engine have the advantage of high performance. Rendering two-dimensional and three-dimensional dynamic objects achieve 20 times faster on GPU than on CPU.
2015-06-30
401) 832-8689 Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39-18 i TABLE OF CONTENTS Section Page LIST OF ILLUSTRATIONS...calculated with a high degree of accuracy—leading to intensive computational calculations and long computational times when dealing with range-depth fields...be obtained using similitude analysis; it allows the comparison of differing explosive weights and provides the means to scale the pressure, energy
Stability-Constrained Aerodynamic Shape Optimization with Applications to Flying Wings
NASA Astrophysics Data System (ADS)
Mader, Charles Alexander
A set of techniques is developed that allows the incorporation of flight dynamics metrics as an additional discipline in a high-fidelity aerodynamic optimization. Specifically, techniques for including static stability constraints and handling qualities constraints in a high-fidelity aerodynamic optimization are demonstrated. These constraints are developed from stability derivative information calculated using high-fidelity computational fluid dynamics (CFD). Two techniques are explored for computing the stability derivatives from CFD. One technique uses an automatic differentiation adjoint technique (ADjoint) to efficiently and accurately compute a full set of static and dynamic stability derivatives from a single steady solution. The other technique uses a linear regression method to compute the stability derivatives from a quasi-unsteady time-spectral CFD solution, allowing for the computation of static, dynamic and transient stability derivatives. Based on the characteristics of the two methods, the time-spectral technique is selected for further development, incorporated into an optimization framework, and used to conduct stability-constrained aerodynamic optimization. This stability-constrained optimization framework is then used to conduct an optimization study of a flying wing configuration. This study shows that stability constraints have a significant impact on the optimal design of flying wings and that, while static stability constraints can often be satisfied by modifying the airfoil profiles of the wing, dynamic stability constraints can require a significant change in the planform of the aircraft in order for the constraints to be satisfied.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Ripamonti, Giancarlo; Abba, Andrea; Geraci, Angelo
2010-05-01
A method for measuring time intervals accurate to the picosecond range is based on phase measurements of oscillating waveforms synchronous with their beginning and/or end. The oscillation is generated by triggering an LC resonant circuit, whose capacitance is precharged. By using high Q resonators and a final active quenching of the oscillation, it is possible to conjugate high time resolution and a small measurement time, which allows a high measurement rate. Methods for fast analysis of the data are considered and discussed with reference to computing resource requirements, speed, and accuracy. Experimental tests show the feasibility of the method and a time accuracy better than 4 ps rms. Methods aimed at further reducing hardware resources are finally discussed.
Rapid Calculation of Spacecraft Trajectories Using Efficient Taylor Series Integration
NASA Technical Reports Server (NTRS)
Scott, James R.; Martini, Michael C.
2011-01-01
A variable-order, variable-step Taylor series integration algorithm was implemented in NASA Glenn's SNAP (Spacecraft N-body Analysis Program) code. SNAP is a high-fidelity trajectory propagation program that can propagate the trajectory of a spacecraft about virtually any body in the solar system. The Taylor series algorithm's very high order accuracy and excellent stability properties lead to large reductions in computer time relative to the code's existing 8th order Runge-Kutta scheme. Head-to-head comparison on near-Earth, lunar, Mars, and Europa missions showed that Taylor series integration is 15.8 times faster than Runge- Kutta on average, and is more accurate. These speedups were obtained for calculations involving central body, other body, thrust, and drag forces. Similar speedups have been obtained for calculations that include J2 spherical harmonic for central body gravitation. The algorithm includes a step size selection method that directly calculates the step size and never requires a repeat step. High-order Taylor series integration algorithms have been shown to provide major reductions in computer time over conventional integration methods in numerous scientific applications. The objective here was to directly implement Taylor series integration in an existing trajectory analysis code and demonstrate that large reductions in computer time (order of magnitude) could be achieved while simultaneously maintaining high accuracy. This software greatly accelerates the calculation of spacecraft trajectories. At each time level, the spacecraft position, velocity, and mass are expanded in a high-order Taylor series whose coefficients are obtained through efficient differentiation arithmetic. This makes it possible to take very large time steps at minimal cost, resulting in large savings in computer time. The Taylor series algorithm is implemented primarily through three subroutines: (1) a driver routine that automatically introduces auxiliary variables and sets up initial conditions and integrates; (2) a routine that calculates system reduced derivatives using recurrence relations for quotients and products; and (3) a routine that determines the step size and sums the series. The order of accuracy used in a trajectory calculation is arbitrary and can be set by the user. The algorithm directly calculates the motion of other planetary bodies and does not require ephemeris files (except to start the calculation). The code also runs with Taylor series and Runge-Kutta used interchangeably for different phases of a mission.
GPUs benchmarking in subpixel image registration algorithm
NASA Astrophysics Data System (ADS)
Sanz-Sabater, Martin; Picazo-Bueno, Jose Angel; Micó, Vicente; Ferrerira, Carlos; Granero, Luis; Garcia, Javier
2015-05-01
Image registration techniques are used among different scientific fields, like medical imaging or optical metrology. The straightest way to calculate shifting between two images is using the cross correlation, taking the highest value of this correlation image. Shifting resolution is given in whole pixels which cannot be enough for certain applications. Better results can be achieved interpolating both images, as much as the desired resolution we want to get, and applying the same technique described before, but the memory needed by the system is significantly higher. To avoid memory consuming we are implementing a subpixel shifting method based on FFT. With the original images, subpixel shifting can be achieved multiplying its discrete Fourier transform by a linear phase with different slopes. This method is high time consuming method because checking a concrete shifting means new calculations. The algorithm, highly parallelizable, is very suitable for high performance computing systems. GPU (Graphics Processing Unit) accelerated computing became very popular more than ten years ago because they have hundreds of computational cores in a reasonable cheap card. In our case, we are going to register the shifting between two images, doing the first approach by FFT based correlation, and later doing the subpixel approach using the technique described before. We consider it as `brute force' method. So we will present a benchmark of the algorithm consisting on a first approach (pixel resolution) and then do subpixel resolution approaching, decreasing the shifting step in every loop achieving a high resolution in few steps. This program will be executed in three different computers. At the end, we will present the results of the computation, with different kind of CPUs and GPUs, checking the accuracy of the method, and the time consumed in each computer, discussing the advantages, disadvantages of the use of GPUs.
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena
2010-09-30
Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
Precise and fast spatial-frequency analysis using the iterative local Fourier transform.
Lee, Sukmock; Choi, Heejoo; Kim, Dae Wook
2016-09-19
The use of the discrete Fourier transform has decreased since the introduction of the fast Fourier transform (fFT), which is a numerically efficient computing process. This paper presents the iterative local Fourier transform (ilFT), a set of new processing algorithms that iteratively apply the discrete Fourier transform within a local and optimal frequency domain. The new technique achieves 210 times higher frequency resolution than the fFT within a comparable computation time. The method's superb computing efficiency, high resolution, spectrum zoom-in capability, and overall performance are evaluated and compared to other advanced high-resolution Fourier transform techniques, such as the fFT combined with several fitting methods. The effectiveness of the ilFT is demonstrated through the data analysis of a set of Talbot self-images (1280 × 1024 pixels) obtained with an experimental setup using grating in a diverging beam produced by a coherent point source.
An Electronic Pressure Profile Display system for aeronautic test facilities
NASA Technical Reports Server (NTRS)
Woike, Mark R.
1990-01-01
The NASA Lewis Research Center has installed an Electronic Pressure Profile Display system. This system provides for the real-time display of pressure readings on high resolution graphics monitors. The Electronic Pressure Profile Display system will replace manometer banks currently used in aeronautic test facilities. The Electronic Pressure Profile Display system consists of an industrial type Digital Pressure Transmitter (DPI) unit which interfaces with a host computer. The host computer collects the pressure data from the DPI unit, converts it into engineering units, and displays the readings on a high resolution graphics monitor in bar graph format. Software was developed to accomplish the above tasks and also draw facility diagrams as background information on the displays. Data transfer between host computer and DPT unit is done with serial communications. Up to 64 channels are displayed with one second update time. This paper describes the system configuration, its features, and its advantages over existing systems.
An electronic pressure profile display system for aeronautic test facilities
NASA Technical Reports Server (NTRS)
Woike, Mark R.
1990-01-01
The NASA Lewis Research Center has installed an Electronic Pressure Profile Display system. This system provides for the real-time display of pressure readings on high resolution graphics monitors. The Electronic Pressure Profile Display system will replace manometer banks currently used in aeronautic test facilities. The Electronic Pressure Profile Display system consists of an industrial type Digital Pressure Transmitter (DPT) unit which interfaces with a host computer. The host computer collects the pressure data from the DPT unit, converts it into engineering units, and displays the readings on a high resolution graphics monitor in bar graph format. Software was developed to accomplish the above tasks and also draw facility diagrams as background information on the displays. Data transfer between host computer and DPT unit is done with serial communications. Up to 64 channels are displayed with one second update time. This paper describes the system configuration, its features, and its advantages over existing systems.
Achieving a high mode count in the exact electromagnetic simulation of diffractive optical elements.
Junker, André; Brenner, Karl-Heinz
2018-03-01
The application of rigorous optical simulation algorithms, both in the modal as well as in the time domain, is known to be limited to the nano-optical scale due to severe computing time and memory constraints. This is true even for today's high-performance computers. To address this problem, we develop the fast rigorous iterative method (FRIM), an algorithm based on an iterative approach, which, under certain conditions, allows solving also large-size problems approximation free. We achieve this in the case of a modal representation by avoiding the computationally complex eigenmode decomposition. Thereby, the numerical cost is reduced from O(N 3 ) to O(N log N), enabling a simulation of structures like certain diffractive optical elements with a significantly higher mode count than presently possible. Apart from speed, another major advantage of the iterative FRIM over standard modal methods is the possibility to trade runtime against accuracy.
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case
Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.
Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.
Current state and future direction of computer systems at NASA Langley Research Center
NASA Technical Reports Server (NTRS)
Rogers, James L. (Editor); Tucker, Jerry H. (Editor)
1992-01-01
Computer systems have advanced at a rate unmatched by any other area of technology. As performance has dramatically increased there has been an equally dramatic reduction in cost. This constant cost performance improvement has precipitated the pervasiveness of computer systems into virtually all areas of technology. This improvement is due primarily to advances in microelectronics. Most people are now convinced that the new generation of supercomputers will be built using a large number (possibly thousands) of high performance microprocessors. Although the spectacular improvements in computer systems have come about because of these hardware advances, there has also been a steady improvement in software techniques. In an effort to understand how these hardware and software advances will effect research at NASA LaRC, the Computer Systems Technical Committee drafted this white paper to examine the current state and possible future directions of computer systems at the Center. This paper discusses selected important areas of computer systems including real-time systems, embedded systems, high performance computing, distributed computing networks, data acquisition systems, artificial intelligence, and visualization.
Multicore Programming Challenges
NASA Astrophysics Data System (ADS)
Perrone, Michael
The computer industry is facing fundamental challenges that are driving a major change in the design of computer processors. Due to restrictions imposed by quantum physics, one historical path to higher computer processor performance - by increased clock frequency - has come to an end. Increasing clock frequency now leads to power consumption costs that are too high to justify. As a result, we have seen in recent years that the processor frequencies have peaked and are receding from their high point. At the same time, competitive market conditions are giving business advantage to those companies that can field new streaming applications, handle larger data sets, and update their models to market conditions faster. The desire for newer, faster and larger is driving continued demand for higher computer performance.
NASA Technical Reports Server (NTRS)
Edwards, John W.; Malone, John B.
1992-01-01
The current status of computational methods for unsteady aerodynamics and aeroelasticity is reviewed. The key features of challenging aeroelastic applications are discussed in terms of the flowfield state: low-angle high speed flows and high-angle vortex-dominated flows. The critical role played by viscous effects in determining aeroelastic stability for conditions of incipient flow separation is stressed. The need for a variety of flow modeling tools, from linear formulations to implementations of the Navier-Stokes equations, is emphasized. Estimates of computer run times for flutter calculations using several computational methods are given. Applications of these methods for unsteady aerodynamic and transonic flutter calculations for airfoils, wings, and configurations are summarized. Finally, recommendations are made concerning future research directions.
NASA Technical Reports Server (NTRS)
Tam, Christopher K. W.; Fang, Jun; Kurbatskii, Konstantin A.
1996-01-01
A set of nonhomogeneous radiation and outflow conditions which automatically generate prescribed incoming acoustic or vorticity waves and, at the same time, are transparent to outgoing sound waves produced internally in a finite computation domain is proposed. This type of boundary condition is needed for the numerical solution of many exterior aeroacoustics problems. In computational aeroacoustics, the computation scheme must be as nondispersive ans nondissipative as possible. It must also support waves with wave speeds which are nearly the same as those of the original linearized Euler equations. To meet these requirements, a high-order/large-stencil scheme is necessary The proposed nonhomogeneous radiation and outflow boundary conditions are designed primarily for use in conjunction with such high-order/large-stencil finite difference schemes.
Ultra-Scale Computing for Emergency Evacuation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhaduri, Budhendra L; Nutaro, James J; Liu, Cheng
2010-01-01
Emergency evacuations are carried out in anticipation of a disaster such as hurricane landfall or flooding, and in response to a disaster that strikes without a warning. Existing emergency evacuation modeling and simulation tools are primarily designed for evacuation planning and are of limited value in operational support for real time evacuation management. In order to align with desktop computing, these models reduce the data and computational complexities through simple approximations and representations of real network conditions and traffic behaviors, which rarely represent real-world scenarios. With the emergence of high resolution physiographic, demographic, and socioeconomic data and supercomputing platforms, itmore » is possible to develop micro-simulation based emergency evacuation models that can foster development of novel algorithms for human behavior and traffic assignments, and can simulate evacuation of millions of people over a large geographic area. However, such advances in evacuation modeling and simulations demand computational capacity beyond the desktop scales and can be supported by high performance computing platforms. This paper explores the motivation and feasibility of ultra-scale computing for increasing the speed of high resolution emergency evacuation simulations.« less
GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.
Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart
2011-06-01
The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.
Real-Time MENTAT programming language and architecture
NASA Technical Reports Server (NTRS)
Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.
1989-01-01
Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Design and analysis of sustainable computer mouse using design for disassembly methodology
NASA Astrophysics Data System (ADS)
Roni Sahroni, Taufik; Fitri Sukarman, Ahmad; Agung Mahardini, Karunia
2017-12-01
This paper presents the design and analysis of computer mouse using Design for Disassembly methodology. Basically, the existing computer mouse model consist a number of unnecessary part that cause the assembly and disassembly time in production. The objective of this project is to design a new computer mouse based on Design for Disassembly (DFD) methodology. The main methodology of this paper was proposed from sketch generation, concept selection, and concept scoring. Based on the design screening, design concept B was selected for further analysis. New design of computer mouse is proposed using fastening system. Furthermore, three materials of ABS, Polycarbonate, and PE high density were prepared to determine the environmental impact category. Sustainable analysis was conducted using software SolidWorks. As a result, PE High Density gives the lowers amount in the environmental category with great maximum stress value.
Center for Center for Technology for Advanced Scientific Component Software (TASCS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostadin, Damevski
A resounding success of the Scientific Discovery through Advanced Computing (SciDAC) program is that high-performance computational science is now universally recognized as a critical aspect of scientific discovery [71], complementing both theoretical and experimental research. As scientific communities prepare to exploit unprecedented computing capabilities of emerging leadership-class machines for multi-model simulations at the extreme scale [72], it is more important than ever to address the technical and social challenges of geographically distributed teams that combine expertise in domain science, applied mathematics, and computer science to build robust and flexible codes that can incorporate changes over time. The Center for Technologymore » for Advanced Scientific Component Software (TASCS)1 tackles these these issues by exploiting component-based software development to facilitate collaborative high-performance scientific computing.« less
Use of high performance networks and supercomputers for real-time flight simulation
NASA Technical Reports Server (NTRS)
Cleveland, Jeff I., II
1993-01-01
In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations must be consistent in processing time and be completed in as short a time as possible. These operations include simulation mathematical model computation and data input/output to the simulators. In 1986, in response to increased demands for flight simulation performance, NASA's Langley Research Center (LaRC), working with the contractor, developed extensions to the Computer Automated Measurement and Control (CAMAC) technology which resulted in a factor of ten increase in the effective bandwidth and reduced latency of modules necessary for simulator communication. This technology extension is being used by more than 80 leading technological developers in the United States, Canada, and Europe. Included among the commercial applications are nuclear process control, power grid analysis, process monitoring, real-time simulation, and radar data acquisition. Personnel at LaRC are completing the development of the use of supercomputers for mathematical model computation to support real-time flight simulation. This includes the development of a real-time operating system and development of specialized software and hardware for the simulator network. This paper describes the data acquisition technology and the development of supercomputing for flight simulation.
NASA Astrophysics Data System (ADS)
Zaripov, D. I.; Renfu, Li
2018-05-01
The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
A vectorized Lanczos eigensolver for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1990-01-01
The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
Development of Parallel Code for the Alaska Tsunami Forecast Model
NASA Astrophysics Data System (ADS)
Bahng, B.; Knight, W. R.; Whitmore, P.
2014-12-01
The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.
Trapping of a micro-bubble by non-paraxial Gaussian beam: computation using the FDTD method.
Sung, Seung-Yong; Lee, Yong-Gu
2008-03-03
Optical forces on a micro-bubble were computed using the Finite Difference Time Domain method. Non-paraxial Gaussian beam equation was used to represent the incident laser with high numerical aperture, common in optical tweezers. The electromagnetic field distribution around a micro-bubble was computed using FDTD method and the electromagnetic stress tensor on the surface of a micro-bubble was used to compute the optical forces. By the analysis of the computational results, interesting relations between the radius of the circular trapping ring and the corresponding stability of the trap were found.
Photoionization and High Density Gas
NASA Technical Reports Server (NTRS)
Kallman, T.; Bautista, M.; White, Nicholas E. (Technical Monitor)
2002-01-01
We present results of calculations using the XSTAR version 2 computer code. This code is loosely based on the XSTAR v.1 code which has been available for public use for some time. However it represents an improvement and update in several major respects, including atomic data, code structure, user interface, and improved physical description of ionization/excitation. In particular, it now is applicable to high density situations in which significant excited atomic level populations are likely to occur. We describe the computational techniques and assumptions, and present sample runs with particular emphasis on high density situations.
New broadband square-law detector
NASA Technical Reports Server (NTRS)
Reid, M. S.; Gardner, R. A.; Stelzried, C. T.
1975-01-01
Compact device has wide dynamic range, accurate square-law response, good thermal stability, high-level dc output with immunity to ground-loop problems, ability to insert known time constants for radiometric applications, and fast response times compatible with computer systems.
NASA Astrophysics Data System (ADS)
Jang, W.; Engda, T. A.; Neff, J. C.; Herrick, J.
2017-12-01
Many crop models are increasingly used to evaluate crop yields at regional and global scales. However, implementation of these models across large areas using fine-scale grids is limited by computational time requirements. In order to facilitate global gridded crop modeling with various scenarios (i.e., different crop, management schedule, fertilizer, and irrigation) using the Environmental Policy Integrated Climate (EPIC) model, we developed a distributed parallel computing framework in Python. Our local desktop with 14 cores (28 threads) was used to test the distributed parallel computing framework in Iringa, Tanzania which has 406,839 grid cells. High-resolution soil data, SoilGrids (250 x 250 m), and climate data, AgMERRA (0.25 x 0.25 deg) were also used as input data for the gridded EPIC model. The framework includes a master file for parallel computing, input database, input data formatters, EPIC model execution, and output analyzers. Through the master file for parallel computing, the user-defined number of threads of CPU divides the EPIC simulation into jobs. Then, Using EPIC input data formatters, the raw database is formatted for EPIC input data and the formatted data moves into EPIC simulation jobs. Then, 28 EPIC jobs run simultaneously and only interesting results files are parsed and moved into output analyzers. We applied various scenarios with seven different slopes and twenty-four fertilizer ranges. Parallelized input generators create different scenarios as a list for distributed parallel computing. After all simulations are completed, parallelized output analyzers are used to analyze all outputs according to the different scenarios. This saves significant computing time and resources, making it possible to conduct gridded modeling at regional to global scales with high-resolution data. For example, serial processing for the Iringa test case would require 113 hours, while using the framework developed in this study requires only approximately 6 hours, a nearly 95% reduction in computing time.
NASA Technical Reports Server (NTRS)
Janus, J. Mark; Whitfield, David L.
1990-01-01
Improvements are presented of a computer algorithm developed for the time-accurate flow analysis of rotating machines. The flow model is a finite volume method utilizing a high-resolution approximate Riemann solver for interface flux definitions. The numerical scheme is a block LU implicit iterative-refinement method which possesses apparent unconditional stability. Multiblock composite gridding is used to orderly partition the field into a specified arrangement of blocks exhibiting varying degrees of similarity. Block-block relative motion is achieved using local grid distortion to reduce grid skewness and accommodate arbitrary time step selection. A general high-order numerical scheme is applied to satisfy the geometric conservation law. An even-blade-count counterrotating unducted fan configuration is chosen for a computational study comparing solutions resulting from altering parameters such as time step size and iteration count. The solutions are compared with measured data.
NASA Technical Reports Server (NTRS)
Meyer, Donald; Uchenik, Igor
2007-01-01
The PPC750 Performance Monitor (Perfmon) is a computer program that helps the user to assess the performance characteristics of application programs running under the Wind River VxWorks real-time operating system on a PPC750 computer. Perfmon generates a user-friendly interface and collects performance data by use of performance registers provided by the PPC750 architecture. It processes and presents run-time statistics on a per-task basis over a repeating time interval (typically, several seconds or minutes) specified by the user. When the Perfmon software module is loaded with the user s software modules, it is available for use through Perfmon commands, without any modification of the user s code and at negligible performance penalty. Per-task run-time performance data made available by Perfmon include percentage time, number of instructions executed per unit time, dispatch ratio, stack high water mark, and level-1 instruction and data cache miss rates. The performance data are written to a file specified by the user or to the serial port of the computer
PREFACE: 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)
NASA Astrophysics Data System (ADS)
Groep, D. L.; Bonacorsi, D.
2014-06-01
In this age and time, capturing 'state of the art' of computing in a conference proceedings gets to be increasingly hard. It is quite common too for the submitted abstracts to refer to studies yet to be done - and the time span between abstract submission and the actual conference is often less than six months. By the time the proceedings appear in journal form, a similar period after its closing session, some of the work is over a year old, by which time new ideas will have been formed and the deployment of current ones progressed - at times beyond recognition. The preface is continued in the pdf.
The constant displacement scheme for tracking particles in heterogeneous aquifers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wen, X.H.; Gomez-Hernandez, J.J.
1996-01-01
Simulation of mass transport by particle tracking or random walk in highly heterogeneous media may be inefficient from a computational point of view if the traditional constant time step scheme is used. A new scheme which adjusts automatically the time step for each particle according to the local pore velocity, so that each particle always travels a constant distance, is shown to be computationally faster for the same degree of accuracy than the constant time step method. Using the constant displacement scheme, transport calculations in a 2-D aquifer model, with nature log-transmissivity variance of 4, can be 8.6 times fastermore » than using the constant time step scheme.« less
Usability of a Low-Cost Head Tracking Computer Access Method following Stroke.
Mah, Jasmine; Jutai, Jeffrey W; Finestone, Hillel; Mckee, Hilary; Carter, Melanie
2015-01-01
Assistive technology devices for computer access can facilitate social reintegration and promote independence for people who have had a stroke. This work describes the exploration of the usefulness and acceptability of a new computer access device called the Nouse™ (Nose-as-mouse). The device uses standard webcam and video recognition algorithms to map the movement of the user's nose to a computer cursor, thereby allowing hands-free computer operation. Ten participants receiving in- or outpatient stroke rehabilitation completed a series of standardized and everyday computer tasks using the Nouse™ and then completed a device usability questionnaire. Task completion rates were high (90%) for computer activities only in the absence of time constraints. Most of the participants were satisfied with ease of use (70%) and liked using the Nouse™ (60%), indicating they could resume most of their usual computer activities apart from word-processing using the device. The findings suggest that hands-free computer access devices like the Nouse™ may be an option for people who experience upper motor impairment caused by stroke and are highly motivated to resume personal computing. More research is necessary to further evaluate the effectiveness of this technology, especially in relation to other computer access assistive technology devices.
Playback system designed for X-Band SAR
NASA Astrophysics Data System (ADS)
Yuquan, Liu; Changyong, Dou
2014-03-01
SAR(Synthetic Aperture Radar) has extensive application because it is daylight and weather independent. In particular, X-Band SAR strip map, designed by Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, provides high ground resolution images, at the same time it has a large spatial coverage and a short acquisition time, so it is promising in multi-applications. When sudden disaster comes, the emergency situation acquires radar signal data and image as soon as possible, in order to take action to reduce loss and save lives in the first time. This paper summarizes a type of X-Band SAR playback processing system designed for disaster response and scientific needs. It describes SAR data workflow includes the payload data transmission and reception process. Playback processing system completes signal analysis on the original data, providing SAR level 0 products and quick image. Gigabit network promises radar signal transmission efficiency from recorder to calculation unit. Multi-thread parallel computing and ping pong operation can ensure computation speed. Through gigabit network, multi-thread parallel computing and ping pong operation, high speed data transmission and processing meet the SAR radar data playback real time requirement.
An SDR-Based Real-Time Testbed for GNSS Adaptive Array Anti-Jamming Algorithms Accelerated by GPU
Xu, Hailong; Cui, Xiaowei; Lu, Mingquan
2016-01-01
Nowadays, software-defined radio (SDR) has become a common approach to evaluate new algorithms. However, in the field of Global Navigation Satellite System (GNSS) adaptive array anti-jamming, previous work has been limited due to the high computational power demanded by adaptive algorithms, and often lack flexibility and configurability. In this paper, the design and implementation of an SDR-based real-time testbed for GNSS adaptive array anti-jamming accelerated by a Graphics Processing Unit (GPU) are documented. This testbed highlights itself as a feature-rich and extendible platform with great flexibility and configurability, as well as high computational performance. Both Space-Time Adaptive Processing (STAP) and Space-Frequency Adaptive Processing (SFAP) are implemented with a wide range of parameters. Raw data from as many as eight antenna elements can be processed in real-time in either an adaptive nulling or beamforming mode. To fully take advantage of the parallelism resource provided by the GPU, a batched method in programming is proposed. Tests and experiments are conducted to evaluate both the computational and anti-jamming performance. This platform can be used for research and prototyping, as well as a real product in certain applications. PMID:26978363
An SDR-Based Real-Time Testbed for GNSS Adaptive Array Anti-Jamming Algorithms Accelerated by GPU.
Xu, Hailong; Cui, Xiaowei; Lu, Mingquan
2016-03-11
Nowadays, software-defined radio (SDR) has become a common approach to evaluate new algorithms. However, in the field of Global Navigation Satellite System (GNSS) adaptive array anti-jamming, previous work has been limited due to the high computational power demanded by adaptive algorithms, and often lack flexibility and configurability. In this paper, the design and implementation of an SDR-based real-time testbed for GNSS adaptive array anti-jamming accelerated by a Graphics Processing Unit (GPU) are documented. This testbed highlights itself as a feature-rich and extendible platform with great flexibility and configurability, as well as high computational performance. Both Space-Time Adaptive Processing (STAP) and Space-Frequency Adaptive Processing (SFAP) are implemented with a wide range of parameters. Raw data from as many as eight antenna elements can be processed in real-time in either an adaptive nulling or beamforming mode. To fully take advantage of the parallelism resource provided by the GPU, a batched method in programming is proposed. Tests and experiments are conducted to evaluate both the computational and anti-jamming performance. This platform can be used for research and prototyping, as well as a real product in certain applications.
Monitoring SLAC High Performance UNIX Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lettsome, Annette K.; /Bethune-Cookman Coll. /SLAC
2005-12-15
Knowledge of the effectiveness and efficiency of computers is important when working with high performance systems. The monitoring of such systems is advantageous in order to foresee possible misfortunes or system failures. Ganglia is a software system designed for high performance computing systems to retrieve specific monitoring information. An alternative storage facility for Ganglia's collected data is needed since its default storage system, the round-robin database (RRD), struggles with data integrity. The creation of a script-driven MySQL database solves this dilemma. This paper describes the process took in the creation and implementation of the MySQL database for use by Ganglia.more » Comparisons between data storage by both databases are made using gnuplot and Ganglia's real-time graphical user interface.« less
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
NASA Technical Reports Server (NTRS)
Bailey, David H.; Ferguson, Helaman R. P.
1988-01-01
Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Arnold, Jeffrey
2018-05-14
Floating-point computations are at the heart of much of the computing done in high energy physics. The correctness, speed and accuracy of these computations are of paramount importance. The lack of any of these characteristics can mean the difference between new, exciting physics and an embarrassing correction. This talk will examine practical aspects of IEEE 754-2008 floating-point arithmetic as encountered in HEP applications. After describing the basic features of IEEE floating-point arithmetic, the presentation will cover: common hardware implementations (SSE, x87) techniques for improving the accuracy of summation, multiplication and data interchange compiler options for gcc and icc affecting floating-point operations hazards to be avoided. About the speaker: Jeffrey M Arnold is a Senior Software Engineer in the Intel Compiler and Languages group at Intel Corporation. He has been part of the Digital->Compaq->Intel compiler organization for nearly 20 years; part of that time, he worked on both low- and high-level math libraries. Prior to that, he was in the VMS Engineering organization at Digital Equipment Corporation. In the late 1980s, Jeff spent 2½ years at CERN as part of the CERN/Digital Joint Project. In 2008, he returned to CERN to spent 10 weeks working with CERN/openlab. Since that time, he has returned to CERN multiple times to teach at openlab workshops and consult with various LHC experiments. Jeff received his Ph.D. in physics from Case Western Reserve University.
HPC enabled real-time remote processing of laparoscopic surgery
NASA Astrophysics Data System (ADS)
Ronaghi, Zahra; Sapra, Karan; Izard, Ryan; Duffy, Edward; Smith, Melissa C.; Wang, Kuang-Ching; Kwartowitz, David M.
2016-03-01
Laparoscopic surgery is a minimally invasive surgical technique. The benefit of small incisions has a disadvantage of limited visualization of subsurface tissues. Image-guided surgery (IGS) uses pre-operative and intra-operative images to map subsurface structures. One particular laparoscopic system is the daVinci-si robotic surgical system. The video streams generate approximately 360 megabytes of data per second. Real-time processing this large stream of data on a bedside PC, single or dual node setup, has become challenging and a high-performance computing (HPC) environment may not always be available at the point of care. To process this data on remote HPC clusters at the typical 30 frames per second rate, it is required that each 11.9 MB video frame be processed by a server and returned within 1/30th of a second. We have implement and compared performance of compression, segmentation and registration algorithms on Clemson's Palmetto supercomputer using dual NVIDIA K40 GPUs per node. Our computing framework will also enable reliability using replication of computation. We will securely transfer the files to remote HPC clusters utilizing an OpenFlow-based network service, Steroid OpenFlow Service (SOS) that can increase performance of large data transfers over long-distance and high bandwidth networks. As a result, utilizing high-speed OpenFlow- based network to access computing clusters with GPUs will improve surgical procedures by providing real-time medical image processing and laparoscopic data.
NASA Technical Reports Server (NTRS)
Dunn, M. H.; Tarkenton, G. M.
1992-01-01
This document describes the computational aspects of propeller noise prediction in the time domain and the use of high speed propeller noise prediction program ASSPIN (Advanced Subsonic and Supersonic Propeller Induced Noise). These formulations are valid in both the near and far fields. Two formulations are utilized by ASSPIN: (1) one is used for subsonic portions of the propeller blade; and (2) the second is used for transonic and supersonic regions on the blade. Switching between the two formulations is done automatically. ASSPIN incorporates advanced blade geometry and surface pressure modelling, adaptive observer time grid strategies, and contains enhanced numerical algorithms that result in reduced computational time. In addition, the ability to treat the nonaxial inflow case has been included.
Large Eddy Simulation in the Computation of Jet Noise
NASA Technical Reports Server (NTRS)
Mankbadi, R. R.; Goldstein, M. E.; Povinelli, L. A.; Hayder, M. E.; Turkel, E.
1999-01-01
Noise can be predicted by solving Full (time-dependent) Compressible Navier-Stokes Equation (FCNSE) with computational domain. The fluctuating near field of the jet produces propagating pressure waves that produce far-field sound. The fluctuating flow field as a function of time is needed in order to calculate sound from first principles. Noise can be predicted by solving the full, time-dependent, compressible Navier-Stokes equations with the computational domain extended to far field - but this is not feasible as indicated above. At high Reynolds number of technological interest turbulence has large range of scales. Direct numerical simulations (DNS) can not capture the small scales of turbulence. The large scales are more efficient than the small scales in radiating sound. The emphasize is thus on calculating sound radiated by large scales.
Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid
2014-12-30
Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Recent Developments in the Application of Biologically Inspired Computation to Chemical Sensing
NASA Astrophysics Data System (ADS)
Marco, S.; Gutierrez-Gálvez, A.
2009-05-01
Biological olfaction outperforms chemical instrumentation in specificity, response time, detection limit, coding capacity, time stability, robustness, size, power consumption, and portability. This biological function provides outstanding performance due, to a large extent, to the unique architecture of the olfactory pathway, which combines a high degree of redundancy, an efficient combinatorial coding along with unmatched chemical information processing mechanisms. The last decade has witnessed important advances in the understanding of the computational primitives underlying the functioning of the olfactory system. In this work, the state of the art concerning biologically inspired computation for chemical sensing will be reviewed. Instead of reviewing the whole body of computational neuroscience of olfaction, we restrict this review to the application of models to the processing of real chemical sensor data.
Multiple shooting shadowing for sensitivity analysis of chaotic dynamical systems
NASA Astrophysics Data System (ADS)
Blonigan, Patrick J.; Wang, Qiqi
2018-02-01
Sensitivity analysis methods are important tools for research and design with simulations. Many important simulations exhibit chaotic dynamics, including scale-resolving turbulent fluid flow simulations. Unfortunately, conventional sensitivity analysis methods are unable to compute useful gradient information for long-time-averaged quantities in chaotic dynamical systems. Sensitivity analysis with least squares shadowing (LSS) can compute useful gradient information for a number of chaotic systems, including simulations of chaotic vortex shedding and homogeneous isotropic turbulence. However, this gradient information comes at a very high computational cost. This paper presents multiple shooting shadowing (MSS), a more computationally efficient shadowing approach than the original LSS approach. Through an analysis of the convergence rate of MSS, it is shown that MSS can have lower memory usage and run time than LSS.
Biocellion: accelerating computer simulation of multicellular biological system models.
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-11-01
Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams.
Chen, Qiuwen; Luley, Ryan; Wu, Qing; Bishop, Morgan; Linderman, Richard W; Qiu, Qinru
2018-05-01
The evolution of high performance computing technologies has enabled the large-scale implementation of neuromorphic models and pushed the research in computational intelligence into a new era. Among the machine learning applications, unsupervised detection of anomalous streams is especially challenging due to the requirements of detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research topic. In this paper, we propose anomaly recognition and detection (AnRAD), a bioinspired detection framework that performs probabilistic inferences. We analyze the feature dependency and develop a self-structuring method that learns an efficient confabulation network using unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base using streaming data. Compared with several existing anomaly detection approaches, our method provides competitive detection quality. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementations of the detection algorithm on the graphic processing unit and the Xeon Phi coprocessor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor. The framework provides real-time service to concurrent data streams within diversified knowledge contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle behavior detection, the framework is able to monitor up to 16000 vehicles (data streams) and their interactions in real time with a single commodity coprocessor, and uses less than 0.2 ms for one testing subject. Finally, the detection network is ported to our spiking neural network simulator to show the potential of adapting to the emerging neuromorphic architectures.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Adaptive-optics optical coherence tomography processing using a graphics processing unit.
Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T
2014-01-01
Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
Multiscale Space-Time Computational Methods for Fluid-Structure Interactions
2015-09-13
prescribed fully or partially, is from an actual locust, extracted from high-speed, multi-camera video recordings of the locust in a wind tunnel . We use...With creative methods for coupling the fluid and structure, we can increase the scope and efficiency of the FSI modeling . Multiscale methods, which now...play an important role in computational mathematics, can also increase the accuracy and efficiency of the computer modeling techniques. The main
Gebremariam, Mekdes K; Totland, Torunn H; Andersen, Lene F; Bergh, Ingunn H; Bjelland, Mona; Grydeland, May; Ommundsen, Yngvar; Lien, Nanna
2012-02-06
In order to inform interventions to prevent sedentariness, more longitudinal studies are needed focusing on stability and change over time in multiple sedentary behaviours. This paper investigates patterns of stability and change in TV/DVD use, computer/electronic game use and total screen time (TST) and factors associated with these patterns among Norwegian children in the transition between childhood and adolescence. The baseline of this longitudinal study took place in September 2007 and included 975 students from 25 control schools of an intervention study, the HEalth In Adolescents (HEIA) study. The first follow-up took place in May 2008 and the second follow-up in May 2009, with 885 students participating at all time points (average age at baseline = 11.2, standard deviation ± 0.3). Time used for/spent on TV/DVD and computer/electronic games was self-reported, and a TST variable (hours/week) was computed. Tracking analyses based on absolute and rank measures, as well as regression analyses to assess factors associated with change in TST and with tracking high TST were conducted. Time spent on all sedentary behaviours investigated increased in both genders. Findings based on absolute and rank measures revealed a fair to moderate level of tracking over the 2 year period. High parental education was inversely related to an increase in TST among females. In males, self-efficacy related to barriers to physical activity and living with married or cohabitating parents were inversely related to an increase in TST. Factors associated with tracking high vs. low TST in the multinomial regression analyses were low self-efficacy and being of an ethnic minority background among females, and low self-efficacy, being overweight/obese and not living with married or cohabitating parents among males. Use of TV/DVD and computer/electronic games increased with age and tracked over time in this group of 11-13 year old Norwegian children. Interventions targeting these sedentary behaviours should thus be introduced early. The identified modifiable and non-modifiable factors associated with change in TST and tracking of high TST should be taken into consideration when planning such interventions.
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.
Real-time inversions for finite fault slip models and rupture geometry based on high-rate GPS data
Minson, Sarah E.; Murray, Jessica R.; Langbein, John O.; Gomberg, Joan S.
2015-01-01
We present an inversion strategy capable of using real-time high-rate GPS data to simultaneously solve for a distributed slip model and fault geometry in real time as a rupture unfolds. We employ Bayesian inference to find the optimal fault geometry and the distribution of possible slip models for that geometry using a simple analytical solution. By adopting an analytical Bayesian approach, we can solve this complex inversion problem (including calculating the uncertainties on our results) in real time. Furthermore, since the joint inversion for distributed slip and fault geometry can be computed in real time, the time required to obtain a source model of the earthquake does not depend on the computational cost. Instead, the time required is controlled by the duration of the rupture and the time required for information to propagate from the source to the receivers. We apply our modeling approach, called Bayesian Evidence-based Fault Orientation and Real-time Earthquake Slip, to the 2011 Tohoku-oki earthquake, 2003 Tokachi-oki earthquake, and a simulated Hayward fault earthquake. In all three cases, the inversion recovers the magnitude, spatial distribution of slip, and fault geometry in real time. Since our inversion relies on static offsets estimated from real-time high-rate GPS data, we also present performance tests of various approaches to estimating quasi-static offsets in real time. We find that the raw high-rate time series are the best data to use for determining the moment magnitude of the event, but slightly smoothing the raw time series helps stabilize the inversion for fault geometry.
NASA Technical Reports Server (NTRS)
Byrne, F.
1981-01-01
Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Near Real-Time Image Reconstruction
NASA Astrophysics Data System (ADS)
Denker, C.; Yang, G.; Wang, H.
2001-08-01
In recent years, post-facto image-processing algorithms have been developed to achieve diffraction-limited observations of the solar surface. We present a combination of frame selection, speckle-masking imaging, and parallel computing which provides real-time, diffraction-limited, 256×256 pixel images at a 1-minute cadence. Our approach to achieve diffraction limited observations is complementary to adaptive optics (AO). At the moment, AO is limited by the fact that it corrects wavefront abberations only for a field of view comparable to the isoplanatic patch. This limitation does not apply to speckle-masking imaging. However, speckle-masking imaging relies on short-exposure images which limits its spectroscopic applications. The parallel processing of the data is performed on a Beowulf-class computer which utilizes off-the-shelf, mass-market technologies to provide high computational performance for scientific calculations and applications at low cost. Beowulf computers have a great potential, not only for image reconstruction, but for any kind of complex data reduction. Immediate access to high-level data products and direct visualization of dynamic processes on the Sun are two of the advantages to be gained.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.
Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun
2013-01-01
As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
Spectral decontamination of a real-time helicopter simulation
NASA Technical Reports Server (NTRS)
Mcfarland, R. E.
1983-01-01
Nonlinear mathematical models of a rotor system, referred to as rotating blade-element models, produce steady-state, high-frequency harmonics of significant magnitude. In a discrete simulation model, certain of these harmonics may be incompatible with realistic real-time computational constraints because of their aliasing into the operational low-pass region. However, the energy is an aliased harmonic may be suppressed by increasing the computation rate of an isolated, causal nonlinearity and using an appropriate filter. This decontamination technique is applied to Sikorsky's real-time model of the Black Hawk helicopter, as supplied to NASA for handling-qualities investigations.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
An image compression algorithm for a high-resolution digital still camera
NASA Technical Reports Server (NTRS)
Nerheim, Rosalee
1989-01-01
The Electronic Still Camera (ESC) project will provide for the capture and transmission of high-quality images without the use of film. The image quality will be superior to video and will approach the quality of 35mm film. The camera, which will have the same general shape and handling as a 35mm camera, will be able to send images to earth in near real-time. Images will be stored in computer memory (RAM) in removable cartridges readable by a computer. To save storage space, the image will be compressed and reconstructed at the time of viewing. Both lossless and loss-y image compression algorithms are studied, described, and compared.
NASA Technical Reports Server (NTRS)
Buckner, J. D.; Council, H. W.; Edwards, T. R.
1974-01-01
Description of the hardware and software implementing the system of time-lapse reproduction of images through interactive graphics (TRIIG). The system produces a quality hard copy of processed images in a fast and inexpensive manner. This capability allows for optimal development of processing software through the rapid viewing of many image frames in an interactive mode. Three critical optical devices are used to reproduce an image: an Optronics photo reader/writer, the Adage Graphics Terminal, and Polaroid Type 57 high speed film. Typical sources of digitized images are observation satellites, such as ERTS or Mariner, computer coupled electron microscopes for high-magnification studies, or computer coupled X-ray devices for medical research.
Time-Accurate Simulations and Acoustic Analysis of Slat Free-Shear Layer
NASA Technical Reports Server (NTRS)
Khorrami, Mehdi R.; Singer, Bart A.; Berkman, Mert E.
2001-01-01
A detailed computational aeroacoustic analysis of a high-lift flow field is performed. Time-accurate Reynolds Averaged Navier-Stokes (RANS) computations simulate the free shear layer that originates from the slat cusp. Both unforced and forced cases are studied. Preliminary results show that the shear layer is a good amplifier of disturbances in the low to mid-frequency range. The Ffowcs-Williams and Hawkings equation is solved to determine the acoustic field using the unsteady flow data from the RANS calculations. The noise radiated from the excited shear layer has a spectral shape qualitatively similar to that obtained from measurements in a corresponding experimental study of the high-lift system.
Comptational Design Of Functional CA-S-H and Oxide Doped Alloy Systems
NASA Astrophysics Data System (ADS)
Yang, Shizhong; Chilla, Lokeshwar; Yang, Yan; Li, Kuo; Wicker, Scott; Zhao, Guang-Lin; Khosravi, Ebrahim; Bai, Shuju; Zhang, Boliang; Guo, Shengmin
Computer aided functional materials design accelerates the discovery of novel materials. This presentation will cover our recent research advance on the Ca-S-H system properties prediction and oxide doped high entropy alloy property simulation and experiment validation. Several recent developed computational materials design methods were utilized to the two systems physical and chemical properties prediction. A comparison of simulation results to the corresponding experiment data will be introduced. This research is partially supported by NSF CIMM project (OIA-15410795 and the Louisiana BoR), NSF HBCU Supplement climate change and ecosystem sustainability subproject 3, and LONI high performance computing time allocation loni mat bio7.
Parallel-vector unsymmetric Eigen-Solver on high performance computers
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Jiangning, Qin
1993-01-01
The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
NASA Astrophysics Data System (ADS)
Chen, Zuojing; Polizzi, Eric
2010-11-01
Effective modeling and numerical spectral-based propagation schemes are proposed for addressing the challenges in time-dependent quantum simulations of systems ranging from atoms, molecules, and nanostructures to emerging nanoelectronic devices. While time-dependent Hamiltonian problems can be formally solved by propagating the solutions along tiny simulation time steps, a direct numerical treatment is often considered too computationally demanding. In this paper, however, we propose to go beyond these limitations by introducing high-performance numerical propagation schemes to compute the solution of the time-ordered evolution operator. In addition to the direct Hamiltonian diagonalizations that can be efficiently performed using the new eigenvalue solver FEAST, we have designed a Gaussian propagation scheme and a basis-transformed propagation scheme (BTPS) which allow to reduce considerably the simulation times needed by time intervals. It is outlined that BTPS offers the best computational efficiency allowing new perspectives in time-dependent simulations. Finally, these numerical schemes are applied to study the ac response of a (5,5) carbon nanotube within a three-dimensional real-space mesh framework.
Design considerations for computationally constrained two-way real-time video communication
NASA Astrophysics Data System (ADS)
Bivolarski, Lazar M.; Saunders, Steven E.; Ralston, John D.
2009-08-01
Today's video codecs have evolved primarily to meet the requirements of the motion picture and broadcast industries, where high-complexity studio encoding can be utilized to create highly-compressed master copies that are then broadcast one-way for playback using less-expensive, lower-complexity consumer devices for decoding and playback. Related standards activities have largely ignored the computational complexity and bandwidth constraints of wireless or Internet based real-time video communications using devices such as cell phones or webcams. Telecommunications industry efforts to develop and standardize video codecs for applications such as video telephony and video conferencing have not yielded image size, quality, and frame-rate performance that match today's consumer expectations and market requirements for Internet and mobile video services. This paper reviews the constraints and the corresponding video codec requirements imposed by real-time, 2-way mobile video applications. Several promising elements of a new mobile video codec architecture are identified, and more comprehensive computational complexity metrics and video quality metrics are proposed in order to support the design, testing, and standardization of these new mobile video codecs.
Efficient Computation of Closed-loop Frequency Response for Large Order Flexible Systems
NASA Technical Reports Server (NTRS)
Maghami, Peiman G.; Giesy, Daniel P.
1997-01-01
An efficient and robust computational scheme is given for the calculation of the frequency response function of a large order, flexible system implemented with a linear, time invariant control system. Advantage is taken of the highly structured sparsity of the system matrix of the plant based on a model of the structure using normal mode coordinates. The computational time per frequency point of the new computational scheme is a linear function of system size, a significant improvement over traditional, full-matrix techniques whose computational times per frequency point range from quadratic to cubic functions of system size. This permits the practical frequency domain analysis of systems of much larger order than by traditional, full-matrix techniques. Formulations are given for both open and closed loop loop systems. Numerical examples are presented showing the advantages of the present formulation over traditional approaches, both in speed and in accuracy. Using a model with 703 structural modes, a speed-up of almost two orders of magnitude was observed while accuracy improved by up to 5 decimal places.
Challenges in scaling NLO generators to leadership computers
NASA Astrophysics Data System (ADS)
Benjamin, D.; Childers, JT; Hoeche, S.; LeCompte, T.; Uram, T.
2017-10-01
Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was successfully scaled to millions of threads to achieve this level of usage on Mira. Sherpa and MadGraph are next-to-leading order generators used heavily by LHC experiments for simulation. Integration times for high-multiplicity or rare processes can take a week or more on standard Grid machines, even using all 16-cores. We will describe our ongoing work to scale the Sherpa generator to thousands of threads on leadership-class machines and reduce run-times to less than a day. This work allows the experiments to leverage large-scale parallel supercomputers for event generation today, freeing tens of millions of grid hours for other work, and paving the way for future applications (simulation, reconstruction) on these and future supercomputers.
Squid - a simple bioinformatics grid.
Carvalho, Paulo C; Glória, Rafael V; de Miranda, Antonio B; Degrave, Wim M
2005-08-03
BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
Dynamic VM Provisioning for TORQUE in a Cloud Environment
NASA Astrophysics Data System (ADS)
Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.
2014-06-01
Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
Development of a small-scale computer cluster
NASA Astrophysics Data System (ADS)
Wilhelm, Jay; Smith, Justin T.; Smith, James E.
2008-04-01
An increase in demand for computing power in academia has necessitated the need for high performance machines. Computing power of a single processor has been steadily increasing, but lags behind the demand for fast simulations. Since a single processor has hard limits to its performance, a cluster of computers can have the ability to multiply the performance of a single computer with the proper software. Cluster computing has therefore become a much sought after technology. Typical desktop computers could be used for cluster computing, but are not intended for constant full speed operation and take up more space than rack mount servers. Specialty computers that are designed to be used in clusters meet high availability and space requirements, but can be costly. A market segment exists where custom built desktop computers can be arranged in a rack mount situation, gaining the space saving of traditional rack mount computers while remaining cost effective. To explore these possibilities, an experiment was performed to develop a computing cluster using desktop components for the purpose of decreasing computation time of advanced simulations. This study indicates that small-scale cluster can be built from off-the-shelf components which multiplies the performance of a single desktop machine, while minimizing occupied space and still remaining cost effective.
Finite-frequency structural sensitivities of short-period compressional body waves
NASA Astrophysics Data System (ADS)
Fuji, Nobuaki; Chevrot, Sébastien; Zhao, Li; Geller, Robert J.; Kawai, Kenji
2012-07-01
We present an extension of the method recently introduced by Zhao & Chevrot for calculating Fréchet kernels from a precomputed database of strain Green's tensors by normal mode summation. The extension involves two aspects: (1) we compute the strain Green's tensors using the Direct Solution Method, which allows us to go up to frequencies as high as 1 Hz; and (2) we develop a spatial interpolation scheme so that the Green's tensors can be computed with a relatively coarse grid, thus improving the efficiency in the computation of the sensitivity kernels. The only requirement is that the Green's tensors be computed with a fine enough spatial sampling rate to avoid spatial aliasing. The Green's tensors can then be interpolated to any location inside the Earth, avoiding the need to store and retrieve strain Green's tensors for a fine sampling grid. The interpolation scheme not only significantly reduces the CPU time required to calculate the Green's tensor database and the disk space to store it, but also enhances the efficiency in computing the kernels by reducing the number of I/O operations needed to retrieve the Green's tensors. Our new implementation allows us to calculate sensitivity kernels for high-frequency teleseismic body waves with very modest computational resources such as a laptop. We illustrate the potential of our approach for seismic tomography by computing traveltime and amplitude sensitivity kernels for high frequency P, PKP and Pdiff phases. A comparison of our PKP kernels with those computed by asymptotic ray theory clearly shows the limits of the latter. With ray theory, it is not possible to model waves diffracted by internal discontinuities such as the core-mantle boundary, and it is also difficult to compute amplitudes for paths close to the B-caustic of the PKP phase. We also compute waveform partial derivatives for different parts of the seismic wavefield, a key ingredient for high resolution imaging by waveform inversion. Our computations of partial derivatives in the time window where PcP precursors are commonly observed show that the distribution of sensitivity is complex and counter-intuitive, with a large contribution from the mid-mantle region. This clearly emphasizes the need to use accurate and complete partial derivatives in waveform inversion.
High-speed pulse-shape generator, pulse multiplexer
Burkhart, Scott C.
2002-01-01
The invention combines arbitrary amplitude high-speed pulses for precision pulse shaping for the National Ignition Facility (NIF). The circuitry combines arbitrary height pulses which are generated by replicating scaled versions of a trigger pulse and summing them delayed in time on a pulse line. The combined electrical pulses are connected to an electro-optic modulator which modulates a laser beam. The circuit can also be adapted to combine multiple channels of high speed data into a single train of electrical pulses which generates the optical pulses for very high speed optical communication. The invention has application in laser pulse shaping for inertial confinement fusion, in optical data links for computers, telecommunications, and in laser pulse shaping for atomic excitation studies. The invention can be used to effect at least a 10.times. increase in all fiber communication lines. It allows a greatly increased data transfer rate between high-performance computers. The invention is inexpensive enough to bring high-speed video and data services to homes through a super modem.
NASA Astrophysics Data System (ADS)
Jizhi, Liu; Xingbi, Chen
2009-12-01
A new quasi-three-dimensional (quasi-3D) numeric simulation method for a high-voltage level-shifting circuit structure is proposed. The performances of the 3D structure are analyzed by combining some 2D device structures; the 2D devices are in two planes perpendicular to each other and to the surface of the semiconductor. In comparison with Davinci, the full 3D device simulation tool, the quasi-3D simulation method can give results for the potential and current distribution of the 3D high-voltage level-shifting circuit structure with appropriate accuracy and the total CPU time for simulation is significantly reduced. The quasi-3D simulation technique can be used in many cases with advantages such as saving computing time, making no demands on the high-end computer terminals, and being easy to operate.
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Radenski, Atanas; Follen, Gregory J. (Technical Monitor)
2001-01-01
The rapid growth of internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of new, internet-oriented software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high -performance computing applications community. The general goal of this research project is to contribute to better understanding of the transition to internet-based high -performance computing and to develop solutions for some of the difficulties of this transition. More specifically, our goal is to design an architecture for generic divide and conquer internet-based computing, to develop a portable implementation of this architecture, to create an example library of high-performance divide-and-conquer computing agents that run on top of this architecture, and to evaluate the performance of these agents. We have been designing an architecture that incorporates a master task-pool server and utilizes satellite computational servers that operate on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. Our designed architecture is intended to be complementary to and accessible from computational grids such as Globus, Legion, and Condor. Grids provide remote access to existing high-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end internet nodes. Our project is focused on a generic divide-and-conquer paradigm and its applications that operate on a loose and ever changing pool of lower-end internet nodes.
GATE Monte Carlo simulation in a cloud computing environment
NASA Astrophysics Data System (ADS)
Rowedder, Blake Austin
The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
Li, Nailu; Mu, Anle; Yang, Xiyun; Magar, Kaman T; Liu, Chao
2018-05-01
The optimal tuning of adaptive flap controller can improve adaptive flap control performance on uncertain operating environments, but the optimization process is usually time-consuming and it is difficult to design proper optimal tuning strategy for the flap control system (FCS). To solve this problem, a novel adaptive flap controller is designed based on a high-efficient differential evolution (DE) identification technique and composite adaptive internal model control (CAIMC) strategy. The optimal tuning can be easily obtained by DE identified inverse of the FCS via CAIMC structure. To achieve fast tuning, a high-efficient modified adaptive DE algorithm is proposed with new mutant operator and varying range adaptive mechanism for the FCS identification. A tradeoff between optimized adaptive flap control and low computation cost is successfully achieved by proposed controller. Simulation results show the robustness of proposed method and its superiority to conventional adaptive IMC (AIMC) flap controller and the CAIMC flap controllers using other DE algorithms on various uncertain operating conditions. The high computation efficiency of proposed controller is also verified based on the computation time on those operating cases. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Dynamic response analysis of structure under time-variant interval process model
NASA Astrophysics Data System (ADS)
Xia, Baizhan; Qin, Yuan; Yu, Dejie; Jiang, Chao
2016-10-01
Due to the aggressiveness of the environmental factor, the variation of the dynamic load, the degeneration of the material property and the wear of the machine surface, parameters related with the structure are distinctly time-variant. Typical model for time-variant uncertainties is the random process model which is constructed on the basis of a large number of samples. In this work, we propose a time-variant interval process model which can be effectively used to deal with time-variant uncertainties with limit information. And then two methods are presented for the dynamic response analysis of the structure under the time-variant interval process model. The first one is the direct Monte Carlo method (DMCM) whose computational burden is relative high. The second one is the Monte Carlo method based on the Chebyshev polynomial expansion (MCM-CPE) whose computational efficiency is high. In MCM-CPE, the dynamic response of the structure is approximated by the Chebyshev polynomials which can be efficiently calculated, and then the variational range of the dynamic response is estimated according to the samples yielded by the Monte Carlo method. To solve the dependency phenomenon of the interval operation, the affine arithmetic is integrated into the Chebyshev polynomial expansion. The computational effectiveness and efficiency of MCM-CPE is verified by two numerical examples, including a spring-mass-damper system and a shell structure.
Bedez, Mathieu; Belhachmi, Zakaria; Haeberlé, Olivier; Greget, Renaud; Moussaoui, Saliha; Bouteiller, Jean-Marie; Bischoff, Serge
2016-01-15
The resolution of a model describing the electrical activity of neural tissue and its propagation within this tissue is highly consuming in term of computing time and requires strong computing power to achieve good results. In this study, we present a method to solve a model describing the electrical propagation in neuronal tissue, using parareal algorithm, coupling with parallelization space using CUDA in graphical processing unit (GPU). We applied the method of resolution to different dimensions of the geometry of our model (1-D, 2-D and 3-D). The GPU results are compared with simulations from a multi-core processor cluster, using message-passing interface (MPI), where the spatial scale was parallelized in order to reach a comparable calculation time than that of the presented method using GPU. A gain of a factor 100 in term of computational time between sequential results and those obtained using the GPU has been obtained, in the case of 3-D geometry. Given the structure of the GPU, this factor increases according to the fineness of the geometry used in the computation. To the best of our knowledge, it is the first time such a method is used, even in the case of neuroscience. Parallelization time coupled with GPU parallelization space allows for drastically reducing computational time with a fine resolution of the model describing the propagation of the electrical signal in a neuronal tissue. Copyright © 2015 Elsevier B.V. All rights reserved.
Computational imaging of light in flight
NASA Astrophysics Data System (ADS)
Hullin, Matthias B.
2014-10-01
Many computer vision tasks are hindered by image formation itself, a process that is governed by the so-called plenoptic integral. By averaging light falling into the lens over space, angle, wavelength and time, a great deal of information is irreversibly lost. The emerging idea of transient imaging operates on a time resolution fast enough to resolve non-stationary light distributions in real-world scenes. It enables the discrimination of light contributions by the optical path length from light source to receiver, a dimension unavailable in mainstream imaging to date. Until recently, such measurements used to require high-end optical equipment and could only be acquired under extremely restricted lab conditions. To address this challenge, we introduced a family of computational imaging techniques operating on standard time-of-flight image sensors, for the first time allowing the user to "film" light in flight in an affordable, practical and portable way. Just as impulse responses have proven a valuable tool in almost every branch of science and engineering, we expect light-in-flight analysis to impact a wide variety of applications in computer vision and beyond.
A GPU-Based Architecture for Real-Time Data Assessment at Synchrotron Experiments
NASA Astrophysics Data System (ADS)
Chilingaryan, Suren; Mirone, Alessandro; Hammersley, Andrew; Ferrero, Claudio; Helfen, Lukas; Kopmann, Andreas; Rolo, Tomy dos Santos; Vagovic, Patrik
2011-08-01
Advances in digital detector technology leads presently to rapidly increasing data rates in imaging experiments. Using fast two-dimensional detectors in computed tomography, the data acquisition can be much faster than the reconstruction if no adequate measures are taken, especially when a high photon flux at synchrotron sources is used. We have optimized the reconstruction software employed at the micro-tomography beamlines of our synchrotron facilities to use the computational power of modern graphic cards. The main paradigm of our approach is the full utilization of all system resources. We use a pipelined architecture, where the GPUs are used as compute coprocessors to reconstruct slices, while the CPUs are preparing the next ones. Special attention is devoted to minimize data transfers between the host and GPU memory and to execute memory transfers in parallel with the computations. We were able to reduce the reconstruction time by a factor 30 and process a typical data set of 20 GB in 40 seconds. The time needed for the first evaluation of the reconstructed sample is reduced significantly and quasi real-time visualization is now possible.
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Virtualizing Super-Computation On-Board Uas
NASA Astrophysics Data System (ADS)
Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.
2015-04-01
Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.
Building high-performance system for processing a daily large volume of Chinese satellites imagery
NASA Astrophysics Data System (ADS)
Deng, Huawu; Huang, Shicun; Wang, Qi; Pan, Zhiqiang; Xin, Yubin
2014-10-01
The number of Earth observation satellites from China increases dramatically recently and those satellites are acquiring a large volume of imagery daily. As the main portal of image processing and distribution from those Chinese satellites, the China Centre for Resources Satellite Data and Application (CRESDA) has been working with PCI Geomatics during the last three years to solve two issues in this regard: processing the large volume of data (about 1,500 scenes or 1 TB per day) in a timely manner and generating geometrically accurate orthorectified products. After three-year research and development, a high performance system has been built and successfully delivered. The high performance system has a service oriented architecture and can be deployed to a cluster of computers that may be configured with high end computing power. The high performance is gained through, first, making image processing algorithms into parallel computing by using high performance graphic processing unit (GPU) cards and multiple cores from multiple CPUs, and, second, distributing processing tasks to a cluster of computing nodes. While achieving up to thirty (and even more) times faster in performance compared with the traditional practice, a particular methodology was developed to improve the geometric accuracy of images acquired from Chinese satellites (including HJ-1 A/B, ZY-1-02C, ZY-3, GF-1, etc.). The methodology consists of fully automatic collection of dense ground control points (GCP) from various resources and then application of those points to improve the photogrammetric model of the images. The delivered system is up running at CRESDA for pre-operational production and has been and is generating good return on investment by eliminating a great amount of manual labor and increasing more than ten times of data throughput daily with fewer operators. Future work, such as development of more performance-optimized algorithms, robust image matching methods and application workflows, is identified to improve the system in the coming years.
Musrfit-Real Time Parameter Fitting Using GPUs
NASA Astrophysics Data System (ADS)
Locans, Uldis; Suter, Andreas
High transverse field μSR (HTF-μSR) experiments typically lead to a rather large data sets, since it is necessary to follow the high frequencies present in the positron decay histograms. The analysis of these data sets can be very time consuming, usually due to the limited computational power of the hardware. To overcome the limited computing resources rotating reference frame transformation (RRF) is often used to reduce the data sets that need to be handled. This comes at a price typically the μSR community is not aware of: (i) due to the RRF transformation the fitting parameter estimate is of poorer precision, i.e., more extended expensive beamtime is needed. (ii) RRF introduces systematic errors which hampers the statistical interpretation of χ2 or the maximum log-likelihood. We will briefly discuss these issues in a non-exhaustive practical way. The only and single purpose of the RRF transformation is the sluggish computer power. Therefore during this work GPU (Graphical Processing Units) based fitting was developed which allows to perform real-time full data analysis without RRF. GPUs have become increasingly popular in scientific computing in recent years. Due to their highly parallel architecture they provide the opportunity to accelerate many applications with considerably less costs than upgrading the CPU computational power. With the emergence of frameworks such as CUDA and OpenCL these devices have become more easily programmable. During this work GPU support was added to Musrfit- a data analysis framework for μSR experiments. The new fitting algorithm uses CUDA or OpenCL to offload the most time consuming parts of the calculations to Nvidia or AMD GPUs. Using the current CPU implementation in Musrfit parameter fitting can take hours for certain data sets while the GPU version can allow to perform real-time data analysis on the same data sets. This work describes the challenges that arise in adding the GPU support to t as well as results obtained using the GPU version. The speedups using the GPU were measured comparing to the CPU implementation. Two different GPUs were used for the comparison — high end Nvidia Tesla K40c GPU designed for HPC applications and AMD Radeon R9 390× GPU designed for gaming industry.
High-performance dual-speed CCD camera system for scientific imaging
NASA Astrophysics Data System (ADS)
Simpson, Raymond W.
1996-03-01
Traditionally, scientific camera systems were partitioned with a `camera head' containing the CCD and its support circuitry and a camera controller, which provided analog to digital conversion, timing, control, computer interfacing, and power. A new, unitized high performance scientific CCD camera with dual speed readout at 1 X 106 or 5 X 106 pixels per second, 12 bit digital gray scale, high performance thermoelectric cooling, and built in composite video output is described. This camera provides all digital, analog, and cooling functions in a single compact unit. The new system incorporates the A/C converter, timing, control and computer interfacing in the camera, with the power supply remaining a separate remote unit. A 100 Mbyte/second serial link transfers data over copper or fiber media to a variety of host computers, including Sun, SGI, SCSI, PCI, EISA, and Apple Macintosh. Having all the digital and analog functions in the camera made it possible to modify this system for the Woods Hole Oceanographic Institution for use on a remote controlled submersible vehicle. The oceanographic version achieves 16 bit dynamic range at 1.5 X 105 pixels/second, can be operated at depths of 3 kilometers, and transfers data to the surface via a real time fiber optic link.
Conversational high resolution mass spectrographic data reduction
NASA Technical Reports Server (NTRS)
Romiez, M. P.
1973-01-01
A FORTRAN 4 program is described which reduces the data obtained from a high resolution mass spectrograph. The program (1) calculates an accurate mass for each line on the photoplate, and (2) assigns elemental compositions to each accurate mass. The program is intended for use in a time-shared computing environment and makes use of the conversational aspects of time-sharing operating systems.
Comparison of the different approaches to generate holograms from data acquired with a Kinect sensor
NASA Astrophysics Data System (ADS)
Kang, Ji-Hoon; Leportier, Thibault; Ju, Byeong-Kwon; Song, Jin Dong; Lee, Kwang-Hoon; Park, Min-Chul
2017-05-01
Data of real scenes acquired in real-time with a Kinect sensor can be processed with different approaches to generate a hologram. 3D models can be generated from a point cloud or a mesh representation. The advantage of the point cloud approach is that computation process is well established since it involves only diffraction and propagation of point sources between parallel planes. On the other hand, the mesh representation enables to reduce the number of elements necessary to represent the object. Then, even though the computation time for the contribution of a single element increases compared to a simple point, the total computation time can be reduced significantly. However, the algorithm is more complex since propagation of elemental polygons between non-parallel planes should be implemented. Finally, since a depth map of the scene is acquired at the same time than the intensity image, a depth layer approach can also be adopted. This technique is appropriate for a fast computation since propagation of an optical wavefront from one plane to another can be handled efficiently with the fast Fourier transform. Fast computation with depth layer approach is convenient for real time applications, but point cloud method is more appropriate when high resolution is needed. In this study, since Kinect can be used to obtain both point cloud and depth map, we examine the different approaches that can be adopted for hologram computation and compare their performance.
Rowland, Mark S.; Howard, Douglas E.; Wong, James L.; Jessup, James L.; Bianchini, Greg M.; Miller, Wayne O.
2007-10-23
A real-time method and computer system for identifying radioactive materials which collects gamma count rates from a HPGe gamma-radiation detector to produce a high-resolution gamma-ray energy spectrum. A library of nuclear material definitions ("library definitions") is provided, with each uniquely associated with a nuclide or isotope material and each comprising at least one logic condition associated with a spectral parameter of a gamma-ray energy spectrum. The method determines whether the spectral parameters of said high-resolution gamma-ray energy spectrum satisfy all the logic conditions of any one of the library definitions, and subsequently uniquely identifies the material type as that nuclide or isotope material associated with the satisfied library definition. The method is iteratively repeated to update the spectrum and identification in real time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lala, J.H.; Nagle, G.A.; Harper, R.E.
1993-05-01
The Maglev control computer system should be designed to verifiably possess high reliability and safety as well as high availability to make Maglev a dependable and attractive transportation alternative to the public. A Maglev control computer system has been designed using a design-for-validation methodology developed earlier under NASA and SDIO sponsorship for real-time aerospace applications. The present study starts by defining the maglev mission scenario and ends with the definition of a maglev control computer architecture. Key intermediate steps included definitions of functional and dependability requirements, synthesis of two candidate architectures, development of qualitative and quantitative evaluation criteria, and analyticalmore » modeling of the dependability characteristics of the two architectures. Finally, the applicability of the design-for-validation methodology was also illustrated by applying it to the German Transrapid TR07 maglev control system.« less
High performance flight computer developed for deep space applications
NASA Technical Reports Server (NTRS)
Bunker, Robert L.
1993-01-01
The development of an advanced space flight computer for real time embedded deep space applications which embodies the lessons learned on Galileo and modern computer technology is described. The requirements are listed and the design implementation that meets those requirements is described. The development of SPACE-16 (Spaceborne Advanced Computing Engine) (where 16 designates the databus width) was initiated to support the MM2 (Marine Mark 2) project. The computer is based on a radiation hardened emulation of a modern 32 bit microprocessor and its family of support devices including a high performance floating point accelerator. Additional custom devices which include a coprocessor to improve input/output capabilities, a memory interface chip, and an additional support chip that provide management of all fault tolerant features, are described. Detailed supporting analyses and rationale which justifies specific design and architectural decisions are provided. The six chip types were designed and fabricated. Testing and evaluation of a brass/board was initiated.
Williams, Eric
2004-11-15
The total energy and fossil fuels used in producing a desktop computer with 17-in. CRT monitor are estimated at 6400 megajoules (MJ) and 260 kg, respectively. This indicates that computer manufacturing is energy intensive: the ratio of fossil fuel use to product weight is 11, an order of magnitude larger than the factor of 1-2 for many other manufactured goods. This high energy intensity of manufacturing, combined with rapid turnover in computers, results in an annual life cycle energy burden that is surprisingly high: about 2600 MJ per year, 1.3 times that of a refrigerator. In contrast with many home appliances, life cycle energy use of a computer is dominated by production (81%) as opposed to operation (19%). Extension of usable lifespan (e.g. by reselling or upgrading) is thus a promising approach to mitigating energy impacts as well as other environmental burdens associated with manufacturing and disposal.
Computational Aerodynamic Modeling of Small Quadcopter Vehicles
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Ventura Diaz, Patricia; Boyd, D. Douglas; Chan, William M.; Theodore, Colin R.
2017-01-01
High-fidelity computational simulations have been performed which focus on rotor-fuselage and rotor-rotor aerodynamic interactions of small quad-rotor vehicle systems. The three-dimensional unsteady Navier-Stokes equations are solved on overset grids using high-order accurate schemes, dual-time stepping, low Mach number preconditioning, and hybrid turbulence modeling. Computational results for isolated rotors are shown to compare well with available experimental data. Computational results in hover reveal the differences between a conventional configuration where the rotors are mounted above the fuselage and an unconventional configuration where the rotors are mounted below the fuselage. Complex flow physics in forward flight is investigated. The goal of this work is to demonstrate that understanding of interactional aerodynamics can be an important factor in design decisions regarding rotor and fuselage placement for next-generation multi-rotor drones.
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
Real-time dynamic display of registered 4D cardiac MR and ultrasound images using a GPU
NASA Astrophysics Data System (ADS)
Zhang, Q.; Huang, X.; Eagleson, R.; Guiraudon, G.; Peters, T. M.
2007-03-01
In minimally invasive image-guided surgical interventions, different imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), and real-time three-dimensional (3D) ultrasound (US), can provide complementary, multi-spectral image information. Multimodality dynamic image registration is a well-established approach that permits real-time diagnostic information to be enhanced by placing lower-quality real-time images within a high quality anatomical context. For the guidance of cardiac procedures, it would be valuable to register dynamic MRI or CT with intraoperative US. However, in practice, either the high computational cost prohibits such real-time visualization of volumetric multimodal images in a real-world medical environment, or else the resulting image quality is not satisfactory for accurate guidance during the intervention. Modern graphics processing units (GPUs) provide the programmability, parallelism and increased computational precision to begin to address this problem. In this work, we first outline our research on dynamic 3D cardiac MR and US image acquisition, real-time dual-modality registration and US tracking. Then we describe image processing and optimization techniques for 4D (3D + time) cardiac image real-time rendering. We also present our multimodality 4D medical image visualization engine, which directly runs on a GPU in real-time by exploiting the advantages of the graphics hardware. In addition, techniques such as multiple transfer functions for different imaging modalities, dynamic texture binding, advanced texture sampling and multimodality image compositing are employed to facilitate the real-time display and manipulation of the registered dual-modality dynamic 3D MR and US cardiac datasets.
Kuntzelman, Karl; Jack Rhodes, L; Harrington, Lillian N; Miskovic, Vladimir
2018-06-01
There is a broad family of statistical methods for capturing time series regularity, with increasingly widespread adoption by the neuroscientific community. A common feature of these methods is that they permit investigators to quantify the entropy of brain signals - an index of unpredictability/complexity. Despite the proliferation of algorithms for computing entropy from neural time series data there is scant evidence concerning their relative stability and efficiency. Here we evaluated several different algorithmic implementations (sample, fuzzy, dispersion and permutation) of multiscale entropy in terms of their stability across sessions, internal consistency and computational speed, accuracy and precision using a combination of electroencephalogram (EEG) and synthetic 1/ƒ noise signals. Overall, we report fair to excellent internal consistency and longitudinal stability over a one-week period for the majority of entropy estimates, with several caveats. Computational timing estimates suggest distinct advantages for dispersion and permutation entropy over other entropy estimates. Considered alongside the psychometric evidence, we suggest several ways in which researchers can maximize computational resources (without sacrificing reliability), especially when working with high-density M/EEG data or multivoxel BOLD time series signals. Copyright © 2018 Elsevier Inc. All rights reserved.
GPU-computing in econophysics and statistical physics
NASA Astrophysics Data System (ADS)
Preis, T.
2011-03-01
A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Asadollahi, Aziz; Khazanovich, Lev
2018-04-11
The emergence of ultrasonic dry point contact (DPC) transducers that emit horizontal shear waves has enabled efficient collection of high-quality data in the context of a nondestructive evaluation of concrete structures. This offers an opportunity to improve the quality of evaluation by adapting advanced imaging techniques. Reverse time migration (RTM) is a simulation-based reconstruction technique that offers advantages over conventional methods, such as the synthetic aperture focusing technique. RTM is capable of imaging boundaries and interfaces with steep slopes and the bottom boundaries of inclusions and defects. However, this imaging technique requires a massive amount of memory and its computation cost is high. In this study, both bottlenecks of the RTM are resolved when shear transducers are used for data acquisition. An analytical approach was developed to obtain the source and receiver wavefields needed for imaging using reverse time migration. It is shown that the proposed analytical approach not only eliminates the high memory demand, but also drastically reduces the computation time from days to minutes. Copyright © 2018 Elsevier B.V. All rights reserved.
Spitzer, James D; Hupert, Nathaniel; Duckart, Jonathan; Xiong, Wei
2007-01-01
Community-based mass prophylaxis is a core public health operational competency, but staffing needs may overwhelm the local trained health workforce. Just-in-time (JIT) training of emergency staff and computer modeling of workforce requirements represent two complementary approaches to address this logistical problem. Multnomah County, Oregon, conducted a high-throughput point of dispensing (POD) exercise to test JIT training and computer modeling to validate POD staffing estimates. The POD had 84% non-health-care worker staff and processed 500 patients per hour. Post-exercise modeling replicated observed staff utilization levels and queue formation, including development and amelioration of a large medical evaluation queue caused by lengthy processing times and understaffing in the first half-hour of the exercise. The exercise confirmed the feasibility of using JIT training for high-throughput antibiotic dispensing clinics staffed largely by nonmedical professionals. Patient processing times varied over the course of the exercise, with important implications for both staff reallocation and future POD modeling efforts. Overall underutilization of staff revealed the opportunity for greater efficiencies and even higher future throughputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zachary M. Prince; Jean C. Ragusa; Yaqi Wang
Because of the recent interest in reactor transient modeling and the restart of the Transient Reactor (TREAT) Facility, there has been a need for more efficient, robust methods in computation frameworks. This is the impetus of implementing the Improved Quasi-Static method (IQS) in the RATTLESNAKE/MOOSE framework. IQS has implemented with CFEM diffusion by factorizing flux into time-dependent amplitude and spacial- and weakly time-dependent shape. The shape evaluation is very similar to a flux diffusion solve and is computed at large (macro) time steps. While the amplitude evaluation is a PRKE solve where the parameters are dependent on the shape andmore » is computed at small (micro) time steps. IQS has been tested with a custom one-dimensional example and the TWIGL ramp benchmark. These examples prove it to be a viable and effective method for highly transient cases. More complex cases are intended to be applied to further test the method and its implementation.« less
Schick-Makaroff, Kara; Molzahn, Anita
2014-01-01
Electronic capture of patients' reports of their health is significant in clinical nephrology research because health-related quality of life (HRQOL) for patients with end-stage renal disease is compromised and assessment by patients of their HRQOL in practice is relatively uncommon. The purpose of this study was to evaluate patient satisfaction with and time involved in administering HRQOL and symptom assessment measures using tablet computers in two outpatient home dialysis clinics. A cross-sectional observational study design was employed. The study was conducted in two home dialysis clinics. Fifty-six patients participated in the study; 35 males (63%) and 21 females (37%) with a mean age of 66 ± 12 (36-90 years old) were included. Forty-nine participants were on peritoneal dialysis (87%), 6 on home hemodialysis (11%), and 1 on nocturnal home hemodialysis (2%). Measures included the Kidney Disease Quality of Life-36 (KDQOL-36), the Edmonton Symptom Assessment Scale (ESAS) and Participant's Level of Satisfaction in Using a Tablet Computer. Using a tablet computer, participants completed the three measures. Descriptive statistics and bivariate correlations were calculated. Participants' satisfaction with use of the tablet computer was high; 66% were "very satisfied", 7% "satisfied", 2% "slightly satisfied", and 18% "neutral". On the 7-point Likert-type scale, the mean satisfaction score was 5.11 (SD = 1.6). Mean time to complete the measures was: Level of Satisfaction 1.15 minutes (SD = 0.41), ESAS 2.55 minutes (SD = 1.04), and KDQOL 9.56 minutes (SD = 2.03); the mean time to complete all three instruments was 13.19 minutes (SD = 2.42). There were no significant correlations between level of satisfaction and age, gender, HRQOL, time taken to complete surveys, computer experience, or comfort with technology. Comfort with technology and computer experience were highly correlated, r = .7, p (one-tailed) < 0.01. Limitations include lack of generalizability because of a small self-selected sample of relatively healthy patients and a lack of psychometric testing on the measure of satisfaction. Participants were satisfied with the platform and the time involved for completion of instruments was modest. Routine use of HRQOL measures for clinical purposes may be facilitated through use of tablet computers.
Nam, Jaewook
2011-01-01
We present a method to solve a convection-reaction system based on a least-squares finite element method (LSFEM). For steady-state computations, issues related to recirculation flow are stated and demonstrated with a simple example. The method can compute concentration profiles in open flow even when the generation term is small. This is the case for estimating hemolysis in blood. Time-dependent flows are computed with the space-time LSFEM discretization. We observe that the computed hemoglobin concentration can become negative in certain regions of the flow; it is a physically unacceptable result. To prevent this, we propose a quadratic transformation of variables. The transformed governing equation can be solved in a straightforward way by LSFEM with no sign of unphysical behavior. The effect of localized high shear on blood damage is shown in a circular Couette-flow-with-blade configuration, and a physiological condition is tested in an arterial graft flow. PMID:21709752
Automated System Tests High-Power MOSFET's
NASA Technical Reports Server (NTRS)
Huston, Steven W.; Wendt, Isabel O.
1994-01-01
Computer-controlled system tests metal-oxide/semiconductor field-effect transistors (MOSFET's) at high voltages and currents. Measures seven parameters characterizing performance of MOSFET, with view toward obtaining early indication MOSFET defective. Use of test system prior to installation of power MOSFET in high-power circuit saves time and money.
A History of High-Performance Computing
NASA Technical Reports Server (NTRS)
2006-01-01
Faster than most speedy computers. More powerful than its NASA data-processing predecessors. Able to leap large, mission-related computational problems in a single bound. Clearly, it s neither a bird nor a plane, nor does it need to don a red cape, because it s super in its own way. It's Columbia, NASA s newest supercomputer and one of the world s most powerful production/processing units. Named Columbia to honor the STS-107 Space Shuttle Columbia crewmembers, the new supercomputer is making it possible for NASA to achieve breakthroughs in science and engineering, fulfilling the Agency s missions, and, ultimately, the Vision for Space Exploration. Shortly after being built in 2004, Columbia achieved a benchmark rating of 51.9 teraflop/s on 10,240 processors, making it the world s fastest operational computer at the time of completion. Putting this speed into perspective, 20 years ago, the most powerful computer at NASA s Ames Research Center, home of the NASA Advanced Supercomputing Division (NAS), ran at a speed of about 1 gigaflop (one billion calculations per second). The Columbia supercomputer is 50,000 times faster than this computer and offers a tenfold increase in capacity over the prior system housed at Ames. What s more, Columbia is considered the world s largest Linux-based, shared-memory system. The system is offering immeasurable benefits to society and is the zenith of years of NASA/private industry collaboration that has spawned new generations of commercial, high-speed computing systems.
Numerical Simulation of the Oscillations in a Mixer: An Internal Aeroacoustic Feedback System
NASA Technical Reports Server (NTRS)
Jorgenson, Philip C. E.; Loh, Ching Y.
2004-01-01
The space-time conservation element and solution element method is employed to numerically study the acoustic feedback system in a high temperature, high speed wind tunnel mixer. The computation captures the self-sustained feedback loop between reflecting Mach waves and the shear layer. This feedback loop results in violent instabilities that are suspected of causing damage to some tunnel components. The computed frequency is in good agreement with the available experimental data. The physical phenomena are explained based on the numerical results.
NASA Astrophysics Data System (ADS)
Spiliotopoulos, I.; Mirmont, M.; Kruijff, M.
2008-08-01
This paper highlights the flight preparation and mission performance of a PC104-based On-Board Computer for ESA's second Young Engineer's Satellite (YES2), with additional attention to the flight software design and experience of QNX as multi-process real-time operating system. This combination of Commercial-Of-The-Shelf (COTS) technologies is an accessible option for small satellites with high computational demands.
Quantum computation in the analysis of hyperspectral data
NASA Astrophysics Data System (ADS)
Gomez, Richard B.; Ghoshal, Debabrata; Jayanna, Anil
2004-08-01
Recent research on the topic of quantum computation provides us with some quantum algorithms with higher efficiency and speedup compared to their classical counterparts. In this paper, it is our intent to provide the results of our investigation of several applications of such quantum algorithms - especially the Grover's Search algorithm - in the analysis of Hyperspectral Data. We found many parallels with Grover's method in existing data processing work that make use of classical spectral matching algorithms. Our efforts also included the study of several methods dealing with hyperspectral image analysis work where classical computation methods involving large data sets could be replaced with quantum computation methods. The crux of the problem in computation involving a hyperspectral image data cube is to convert the large amount of data in high dimensional space to real information. Currently, using the classical model, different time consuming methods and steps are necessary to analyze these data including: Animation, Minimum Noise Fraction Transform, Pixel Purity Index algorithm, N-dimensional scatter plot, Identification of Endmember spectra - are such steps. If a quantum model of computation involving hyperspectral image data can be developed and formalized - it is highly likely that information retrieval from hyperspectral image data cubes would be a much easier process and the final information content would be much more meaningful and timely. In this case, dimensionality would not be a curse, but a blessing.
Three-dimensional spatiotemporal features for fast content-based retrieval of focal liver lesions.
Roy, Sharmili; Chi, Yanling; Liu, Jimin; Venkatesh, Sudhakar K; Brown, Michael S
2014-11-01
Content-based image retrieval systems for 3-D medical datasets still largely rely on 2-D image-based features extracted from a few representative slices of the image stack. Most 2 -D features that are currently used in the literature not only model a 3-D tumor incompletely but are also highly expensive in terms of computation time, especially for high-resolution datasets. Radiologist-specified semantic labels are sometimes used along with image-based 2-D features to improve the retrieval performance. Since radiological labels show large interuser variability, are often unstructured, and require user interaction, their use as lesion characterizing features is highly subjective, tedious, and slow. In this paper, we propose a 3-D image-based spatiotemporal feature extraction framework for fast content-based retrieval of focal liver lesions. All the features are computer generated and are extracted from four-phase abdominal CT images. Retrieval performance and query processing times for the proposed framework is evaluated on a database of 44 hepatic lesions comprising of five pathological types. Bull's eye percentage score above 85% is achieved for three out of the five lesion pathologies and for 98% of query lesions, at least one same type of lesion is ranked among the top two retrieved results. Experiments show that the proposed system's query processing is more than 20 times faster than other already published systems that use 2-D features. With fast computation time and high retrieval accuracy, the proposed system has the potential to be used as an assistant to radiologists for routine hepatic tumor diagnosis.
NASA Astrophysics Data System (ADS)
Jiang, Yang; Gong, Yuanzheng; Wang, Thomas D.; Seibel, Eric J.
2017-02-01
Multimodal endoscopy, with fluorescence-labeled probes binding to overexpressed molecular targets, is a promising technology to visualize early-stage cancer. T/B ratio is the quantitative analysis used to correlate fluorescence regions to cancer. Currently, T/B ratio calculation is post-processing and does not provide real-time feedback to the endoscopist. To achieve real-time computer assisted diagnosis (CAD), we establish image processing protocols for calculating T/B ratio and locating high-risk fluorescence regions for guiding biopsy and therapy in Barrett's esophagus (BE) patients. Methods: Chan-Vese algorithm, an active contour model, is used to segment high-risk regions in fluorescence videos. A semi-implicit gradient descent method was applied to minimize the energy function of this algorithm and evolve the segmentation. The surrounding background was then identified using morphology operation. The average T/B ratio was computed and regions of interest were highlighted based on user-selected thresholding. Evaluation was conducted on 50 fluorescence videos acquired from clinical video recordings using a custom multimodal endoscope. Results: With a processing speed of 2 fps on a laptop computer, we obtained accurate segmentation of high-risk regions examined by experts. For each case, the clinical user could optimize target boundary by changing the penalty on area inside the contour. Conclusion: Automatic and real-time procedure of calculating T/B ratio and identifying high-risk regions of early esophageal cancer was developed. Future work will increase processing speed to <5 fps, refine the clinical interface, and apply to additional GI cancers and fluorescence peptides.
Real-Time Embedded High Performance Computing: Communications Scheduling.
1995-06-01
real - time operating system must explicitly limit the degradation of the timing performance of all processes as the number of processes...adequately supported by a real - time operating system , could compound the development problems encountered in the past. Many experts feel that the... real - time operating system support for an MPP, although they all provide some support for distributed real-time applications. A distributed real
Computers for real time flight simulation: A market survey
NASA Technical Reports Server (NTRS)
Bekey, G. A.; Karplus, W. J.
1977-01-01
An extensive computer market survey was made to determine those available systems suitable for current and future flight simulation studies at Ames Research Center. The primary requirement is for the computation of relatively high frequency content (5 Hz) math models representing powered lift flight vehicles. The Rotor Systems Research Aircraft (RSRA) was used as a benchmark vehicle for computation comparison studies. The general nature of helicopter simulations and a description of the benchmark model are presented, and some of the sources of simulation difficulties are examined. A description of various applicable computer architectures is presented, along with detailed discussions of leading candidate systems and comparisons between them.
Belle II grid computing: An overview of the distributed data management system.
NASA Astrophysics Data System (ADS)
Bansal, Vikas; Schram, Malachi; Belle Collaboration, II
2017-01-01
The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.
Li, Longxiang; Xue, Donglin; Deng, Weijie; Wang, Xu; Bai, Yang; Zhang, Feng; Zhang, Xuejun
2017-11-10
In deterministic computer-controlled optical surfacing, accurate dwell time execution by computer numeric control machines is crucial in guaranteeing a high-convergence ratio for the optical surface error. It is necessary to consider the machine dynamics limitations in the numerical dwell time algorithms. In this paper, these constraints on dwell time distribution are analyzed, and a model of the equal extra material removal is established. A positive dwell time algorithm with minimum equal extra material removal is developed. Results of simulations based on deterministic magnetorheological finishing demonstrate the necessity of considering machine dynamics performance and illustrate the validity of the proposed algorithm. Indeed, the algorithm effectively facilitates the determinacy of sub-aperture optical surfacing processes.
Enabling a high throughput real time data pipeline for a large radio telescope array with GPUs
NASA Astrophysics Data System (ADS)
Edgar, R. G.; Clark, M. A.; Dale, K.; Mitchell, D. A.; Ord, S. M.; Wayth, R. B.; Pfister, H.; Greenhill, L. J.
2010-10-01
The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. Raw data will be generated continuously at 5 GiB s-1, grouped into 8 s cadences. This high throughput motivates the development of on-site, real time processing and reduction in preference to archiving, transport and off-line processing. Each batch of 8 s data must be completely reduced before the next batch arrives. Maintaining real time operation will require a sustained performance of around 2.5 TFLOP s-1 (including convolutions, FFTs, interpolations and matrix multiplications). We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-Watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized between the nodes hosting the GPUs. The MWA will be a frontier scientific instrument and a pathfinder for planned peta- and exa-scale facilities.
Development of embedded real-time and high-speed vision platform
NASA Astrophysics Data System (ADS)
Ouyang, Zhenxing; Dong, Yimin; Yang, Hua
2015-12-01
Currently, high-speed vision platforms are widely used in many applications, such as robotics and automation industry. However, a personal computer (PC) whose over-large size is not suitable and applicable in compact systems is an indispensable component for human-computer interaction in traditional high-speed vision platforms. Therefore, this paper develops an embedded real-time and high-speed vision platform, ER-HVP Vision which is able to work completely out of PC. In this new platform, an embedded CPU-based board is designed as substitution for PC and a DSP and FPGA board is developed for implementing image parallel algorithms in FPGA and image sequential algorithms in DSP. Hence, the capability of ER-HVP Vision with size of 320mm x 250mm x 87mm can be presented in more compact condition. Experimental results are also given to indicate that the real-time detection and counting of the moving target at a frame rate of 200 fps at 512 x 512 pixels under the operation of this newly developed vision platform are feasible.
NASA Astrophysics Data System (ADS)
Tong, Kai; Fan, Shiming; Gong, Derong; Lu, Zuming; Liu, Jian
The synchronizer/data buffer (SDB) in the command and data acquisition station for China's future Geostationary Meteorological Satellite is described. Several computers and special microprocessors are used in tandem with minimized hardware to fulfill all of the functions. The high-accuracy digital phase locked loop is operated by computer and by controlling the count value of the 20-MHz clock to acquire and track such signals as sun pulse, scan synchronization detection pulse, and earth pulse. Sun pulse and VISSR data are recorded precisely and economically by digitizing the time relation. The VISSR scan timing and equiangular control timing, and equal time sampling on satellite are also discussed.
NASA Technical Reports Server (NTRS)
Rediess, Herman A.; Ramnath, Rudrapatna V.; Vrable, Daniel L.; Hirvo, David H.; Mcmillen, Lowell D.; Osofsky, Irving B.
1991-01-01
The results are presented of a study to identify potential real time remote computational applications to support monitoring HRV flight test experiments along with definitions of preliminary requirements. A major expansion of the support capability available at Ames-Dryden was considered. The focus is on the use of extensive computation and data bases together with real time flight data to generate and present high level information to those monitoring the flight. Six examples were considered: (1) boundary layer transition location; (2) shock wave position estimation; (3) performance estimation; (4) surface temperature estimation; (5) critical structural stress estimation; and (6) stability estimation.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
Addressing the challenges of standalone multi-core simulations in molecular dynamics
NASA Astrophysics Data System (ADS)
Ocaya, R. O.; Terblans, J. J.
2017-07-01
Computational modelling in material science involves mathematical abstractions of force fields between particles with the aim to postulate, develop and understand materials by simulation. The aggregated pairwise interactions of the material's particles lead to a deduction of its macroscopic behaviours. For practically meaningful macroscopic scales, a large amount of data are generated, leading to vast execution times. Simulation times of hours, days or weeks for moderately sized problems are not uncommon. The reduction of simulation times, improved result accuracy and the associated software and hardware engineering challenges are the main motivations for many of the ongoing researches in the computational sciences. This contribution is concerned mainly with simulations that can be done on a "standalone" computer based on Message Passing Interfaces (MPI), parallel code running on hardware platforms with wide specifications, such as single/multi- processor, multi-core machines with minimal reconfiguration for upward scaling of computational power. The widely available, documented and standardized MPI library provides this functionality through the MPI_Comm_size (), MPI_Comm_rank () and MPI_Reduce () functions. A survey of the literature shows that relatively little is written with respect to the efficient extraction of the inherent computational power in a cluster. In this work, we discuss the main avenues available to tap into this extra power without compromising computational accuracy. We also present methods to overcome the high inertia encountered in single-node-based computational molecular dynamics. We begin by surveying the current state of the art and discuss what it takes to achieve parallelism, efficiency and enhanced computational accuracy through program threads and message passing interfaces. Several code illustrations are given. The pros and cons of writing raw code as opposed to using heuristic, third-party code are also discussed. The growing trend towards graphical processor units and virtual computing clouds for high-performance computing is also discussed. Finally, we present the comparative results of vacancy formation energy calculations using our own parallelized standalone code called Verlet-Stormer velocity (VSV) operating on 30,000 copper atoms. The code is based on the Sutton-Chen implementation of the Finnis-Sinclair pairwise embedded atom potential. A link to the code is also given.
What is this thing called 'Internet'?
Coleman, N J
1996-04-01
According to a survey conducted by the Australian Journal of Nursing in 1994, 43% of nurses owned personal computers and another 28% were considering the purchase of one in the next year. Given such a high ownership rate, and the interest in the 'Net' itself, some understanding of the fundamentals of online computing and the Internet would seem timely.
Some Problems of Computer-Aided Testing and "Interview-Like Tests"
ERIC Educational Resources Information Center
Smoline, D.V.
2008-01-01
Computer-based testing--is an effective teacher's tool, intended to optimize course goals and assessment techniques in a comparatively short time. However, this is accomplished only if we deal with high-quality tests. It is strange, but despite the 100-year history of Testing Theory (see, Anastasi, A., Urbina, S. (1997). Psychological testing.…
A Survey of Navigational Computer Program in the Museum Setting.
ERIC Educational Resources Information Center
Sliman, Paula
Patron service is a high priority in the library setting and alleviating a large percentage of the directional questions will provide librarians with more time to help patrons more thoroughly than they are able to currently. Furthermore, in view of the current economic trend of downsizing, a navigational computer system program has the potential…
Real-time spectral analysis of HRV signals: an interactive and user-friendly PC system.
Basano, L; Canepa, F; Ottonello, P
1998-01-01
We present a real-time system, built around a PC and a low-cost data acquisition board, for the spectral analysis of the heart rate variability signal. The Windows-like operating environment on which it is based makes the computer program very user-friendly even for non-specialized personnel. The Power Spectral Density is computed through the use of a hybrid method, in which a classical FFT analysis follows an autoregressive finite-extension of data; the stationarity of the sequence is continuously checked. The use of this algorithm gives a high degree of robustness of the spectral estimation. Moreover, always in real time, the FFT of every data block is computed and displayed in order to corroborate the results as well as to allow the user to interactively choose a proper AR model order.
Banks, H T; Birch, Malcolm J; Brewin, Mark P; Greenwald, Stephen E; Hu, Shuhua; Kenz, Zackary R; Kruse, Carola; Maischak, Matthias; Shaw, Simon; Whiteman, John R
2014-04-13
We revisit a method originally introduced by Werder et al. (in Comput. Methods Appl. Mech. Engrg., 190:6685-6708, 2001) for temporally discontinuous Galerkin FEMs applied to a parabolic partial differential equation. In that approach, block systems arise because of the coupling of the spatial systems through inner products of the temporal basis functions. If the spatial finite element space is of dimension D and polynomials of degree r are used in time, the block system has dimension ( r + 1) D and is usually regarded as being too large when r > 1. Werder et al. found that the space-time coupling matrices are diagonalizable over [Formula: see text] for r ⩽ 100, and this means that the time-coupled computations within a time step can actually be decoupled. By using either continuous Galerkin or spectral element methods in space, we apply this DG-in-time methodology, for the first time, to second-order wave equations including elastodynamics with and without Kelvin-Voigt and Maxwell-Zener viscoelasticity. An example set of numerical results is given to demonstrate the favourable effect on error and computational work of the moderately high-order (up to degree 7) temporal and spatio-temporal approximations, and we also touch on an application of this method to an ambitious problem related to the diagnosis of coronary artery disease. Copyright © 2014 The Authors. International Journal for Numerical Methods in Engineering published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Jenkins, David R.; Basden, Alastair; Myers, Richard M.
2018-05-01
We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Banks, H T; Birch, Malcolm J; Brewin, Mark P; Greenwald, Stephen E; Hu, Shuhua; Kenz, Zackary R; Kruse, Carola; Maischak, Matthias; Shaw, Simon; Whiteman, John R
2014-01-01
We revisit a method originally introduced by Werder et al. (in Comput. Methods Appl. Mech. Engrg., 190:6685–6708, 2001) for temporally discontinuous Galerkin FEMs applied to a parabolic partial differential equation. In that approach, block systems arise because of the coupling of the spatial systems through inner products of the temporal basis functions. If the spatial finite element space is of dimension D and polynomials of degree r are used in time, the block system has dimension (r + 1)D and is usually regarded as being too large when r > 1. Werder et al. found that the space-time coupling matrices are diagonalizable over for r ⩽100, and this means that the time-coupled computations within a time step can actually be decoupled. By using either continuous Galerkin or spectral element methods in space, we apply this DG-in-time methodology, for the first time, to second-order wave equations including elastodynamics with and without Kelvin–Voigt and Maxwell–Zener viscoelasticity. An example set of numerical results is given to demonstrate the favourable effect on error and computational work of the moderately high-order (up to degree 7) temporal and spatio-temporal approximations, and we also touch on an application of this method to an ambitious problem related to the diagnosis of coronary artery disease. Copyright © 2014 The Authors. International Journal for Numerical Methods in Engineering published by John Wiley & Sons Ltd. PMID:25834284
NASA Astrophysics Data System (ADS)
Ji, Xing; Zhao, Fengxiang; Shyy, Wei; Xu, Kun
2018-03-01
Most high order computational fluid dynamics (CFD) methods for compressible flows are based on Riemann solver for the flux evaluation and Runge-Kutta (RK) time stepping technique for temporal accuracy. The advantage of this kind of space-time separation approach is the easy implementation and stability enhancement by introducing more middle stages. However, the nth-order time accuracy needs no less than n stages for the RK method, which can be very time and memory consuming due to the reconstruction at each stage for a high order method. On the other hand, the multi-stage multi-derivative (MSMD) method can be used to achieve the same order of time accuracy using less middle stages with the use of the time derivatives of the flux function. For traditional Riemann solver based CFD methods, the lack of time derivatives in the flux function prevents its direct implementation of the MSMD method. However, the gas kinetic scheme (GKS) provides such a time accurate evolution model. By combining the second-order or third-order GKS flux functions with the MSMD technique, a family of high order gas kinetic methods can be constructed. As an extension of the previous 2-stage 4th-order GKS, the 5th-order schemes with 2 and 3 stages will be developed in this paper. Based on the same 5th-order WENO reconstruction, the performance of gas kinetic schemes from the 2nd- to the 5th-order time accurate methods will be evaluated. The results show that the 5th-order scheme can achieve the theoretical order of accuracy for the Euler equations, and present accurate Navier-Stokes solutions as well due to the coupling of inviscid and viscous terms in the GKS formulation. In comparison with Riemann solver based 5th-order RK method, the high order GKS has advantages in terms of efficiency, accuracy, and robustness, for all test cases. The 4th- and 5th-order GKS have the same robustness as the 2nd-order scheme for the capturing of discontinuous solutions. The current high order MSMD GKS is a multi-dimensional scheme with incorporation of both normal and tangential spatial derivatives of flow variables at a cell interface in the flux evaluation. The scheme can be extended straightforwardly to viscous flow computation in unstructured mesh. It provides a promising direction for the development of high-order CFD methods for the computation of complex flows, such as turbulence and acoustics with shock interactions.
Automated Approach to Very High-Order Aeroacoustic Computations. Revision
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Goodrich, John W.
2001-01-01
Computational aeroacoustics requires efficient, high-resolution simulation tools. For smooth problems, this is best accomplished with very high-order in space and time methods on small stencils. However, the complexity of highly accurate numerical methods can inhibit their practical application, especially in irregular geometries. This complexity is reduced by using a special form of Hermite divided-difference spatial interpolation on Cartesian grids, and a Cauchy-Kowalewski recursion procedure for time advancement. In addition, a stencil constraint tree reduces the complexity of interpolating grid points that am located near wall boundaries. These procedures are used to develop automatically and to implement very high-order methods (> 15) for solving the linearized Euler equations that can achieve less than one grid point per wavelength resolution away from boundaries by including spatial derivatives of the primitive variables at each grid point. The accuracy of stable surface treatments is currently limited to 11th order for grid aligned boundaries and to 2nd order for irregular boundaries.
An Automated Approach to Very High Order Aeroacoustic Computations in Complex Geometries
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Goodrich, John W.
2000-01-01
Computational aeroacoustics requires efficient, high-resolution simulation tools. And for smooth problems, this is best accomplished with very high order in space and time methods on small stencils. But the complexity of highly accurate numerical methods can inhibit their practical application, especially in irregular geometries. This complexity is reduced by using a special form of Hermite divided-difference spatial interpolation on Cartesian grids, and a Cauchy-Kowalewslci recursion procedure for time advancement. In addition, a stencil constraint tree reduces the complexity of interpolating grid points that are located near wall boundaries. These procedures are used to automatically develop and implement very high order methods (>15) for solving the linearized Euler equations that can achieve less than one grid point per wavelength resolution away from boundaries by including spatial derivatives of the primitive variables at each grid point. The accuracy of stable surface treatments is currently limited to 11th order for grid aligned boundaries and to 2nd order for irregular boundaries.
Analog Correlator Based on One Bit Digital Correlator
NASA Technical Reports Server (NTRS)
Prokop, Norman (Inventor); Krasowski, Michael (Inventor)
2017-01-01
A two input time domain correlator may perform analog correlation. In order to achieve high throughput rates with reduced or minimal computational overhead, the input data streams may be hard limited through adaptive thresholding to yield two binary bit streams. Correlation may be achieved through the use of a Hamming distance calculation, where the distance between the two bit streams approximates the time delay that separates them. The resulting Hamming distance approximates the correlation time delay with high accuracy.
Algorithms for Efficient Computation of Transfer Functions for Large Order Flexible Systems
NASA Technical Reports Server (NTRS)
Maghami, Peiman G.; Giesy, Daniel P.
1998-01-01
An efficient and robust computational scheme is given for the calculation of the frequency response function of a large order, flexible system implemented with a linear, time invariant control system. Advantage is taken of the highly structured sparsity of the system matrix of the plant based on a model of the structure using normal mode coordinates. The computational time per frequency point of the new computational scheme is a linear function of system size, a significant improvement over traditional, still-matrix techniques whose computational times per frequency point range from quadratic to cubic functions of system size. This permits the practical frequency domain analysis of systems of much larger order than by traditional, full-matrix techniques. Formulations are given for both open- and closed-loop systems. Numerical examples are presented showing the advantages of the present formulation over traditional approaches, both in speed and in accuracy. Using a model with 703 structural modes, the present method was up to two orders of magnitude faster than a traditional method. The present method generally showed good to excellent accuracy throughout the range of test frequencies, while traditional methods gave adequate accuracy for lower frequencies, but generally deteriorated in performance at higher frequencies with worst case errors being many orders of magnitude times the correct values.
Comparing High-latitude Ionospheric and Thermospheric Lagrangian Coherent Structures
NASA Astrophysics Data System (ADS)
Wang, N.; Ramirez, U.; Flores, F.; Okic, D.; Datta-Barua, S.
2015-12-01
Lagrangian Coherent Structures (LCSs) are invisible boundaries in time varying flow fields that may be subject to mixing and turbulence. The LCS is defined by the local maxima of the finite time Lyapunov exponent (FTLE), a scalar field quantifying the degree of stretching of fluid elements over the flow domain. Although the thermosphere is dominated by neutral wind processes and the ionosphere is governed by plasma electrodynamics, we can compare the LCS in the two modeled flow fields to yield insight into transport and interaction processes in the high-latitude IT system. For obtaining thermospheric LCS, we use the Horizontal Wind Model 2014 (HWM14) [1] at a single altitude to generate the two-dimensional velocity field. The FTLE computation is applied to study the flow field of the neutral wind, and to visualize the forward-time Lagrangian Coherent Structures in the flow domain. The time-varying structures indicate a possible thermospheric LCS ridge in the auroral oval area. The results of a two-day run during a geomagnetically quiet period show that the structures are diurnally quasi-periodic, thus that solar radiation influences the neutral wind flow field. To find the LCS in the high-latitude ionospheric drifts, the Weimer 2001 [2] polar electric potential model and the International Geomagnetic Reference Field 11 [3] are used to compute the ExB drift flow field in ionosphere. As with the neutral winds, the Lagrangian Coherent Structures are obtained by applying the FTLE computation. The relationship between the thermospheric and ionospheric LCS is analyzed by comparing overlapping FTLE maps. Both a publicly available FTLE solver [4] and a custom-built FTLE computation are used and compared for validation [5]. Comparing the modeled IT LCSs on a quiet day with the modeled IT LCSs on a storm day indicates important factors on the structure and time evolution of the LCS.
Multi-GPGPU Tsunami simulation at Toyama-bay
NASA Astrophysics Data System (ADS)
Furuyama, Shoichi; Ueda, Yuki
2017-07-01
Accelerated multi General Purpose Graphics Processing Unit (GPGPU) calculation for Tsunami run-up simulation was achieved at the wide area (whole Toyama-bay in Japan) by faster computation technique. Toyama-bay has active-faults at the sea-bed. It has a high possibility to occur earthquakes and Tsunami waves in the case of the huge earthquake, that's why to predict the area of Tsunami run-up is important for decreasing damages to residents by the disaster. However it is very hard task to achieve the simulation by the computer resources problem. A several meter's order of the high resolution calculation is required for the running-up Tsunami simulation because artificial structures on the ground such as roads, buildings, and houses are very small. On the other hand the huge area simulation is also required. In the Toyama-bay case the area is 42 [km] × 15 [km]. When 5 [m] × 5 [m] size computational cells are used for the simulation, over 26,000,000 computational cells are generated. To calculate the simulation, a normal CPU desktop computer took about 10 hours for the calculation. An improvement of calculation time is important problem for the immediate prediction system of Tsunami running-up, as a result it will contribute to protect a lot of residents around the coastal region. The study tried to decrease this calculation time by using multi GPGPU system which is equipped with six NVIDIA TESLA K20xs, InfiniBand network connection between computer nodes by MVAPICH library. As a result 5.16 times faster calculation was achieved on six GPUs than one GPU case and it was 86% parallel efficiency to the linear speed up.
Alternative majority-voting methods for real-time computing systems
NASA Technical Reports Server (NTRS)
Shin, Kang G.; Dolter, James W.
1989-01-01
Two techniques that provide a compromise between the high time overhead in maintaining synchronous voting and the difficulty of combining results in asynchronous voting are proposed. These techniques are specifically suited for real-time applications with a single-source/single-sink structure that need instantaneous error masking. They provide a compromise between a tightly synchronized system in which the synchronization overhead can be quite high, and an asynchronous system which lacks suitable algorithms for combining the output data. Both quorum-majority voting (QMV) and compare-majority voting (CMV) are most applicable to distributed real-time systems with single-source/single-sink tasks. All real-time systems eventually have to resolve their outputs into a single action at some stage. The development of the advanced information processing system (AIPS) and other similar systems serve to emphasize the importance of these techniques. Time bounds suggest that it is possible to reduce the overhead for quorum-majority voting to below that for synchronous voting. All the bounds assume that the computation phase is nonpreemptive and that there is no multitasking.
Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments
Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.
2012-01-01
Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Ubiquitous green computing techniques for high demand applications in Smart environments.
Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L
2012-01-01
Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.
LLSURE: local linear SURE-based edge-preserving image filtering.
Qiu, Tianshuang; Wang, Aiqi; Yu, Nannan; Song, Aimin
2013-01-01
In this paper, we propose a novel approach for performing high-quality edge-preserving image filtering. Based on a local linear model and using the principle of Stein's unbiased risk estimate as an estimator for the mean squared error from the noisy image only, we derive a simple explicit image filter which can filter out noise while preserving edges and fine-scale details. Moreover, this filter has a fast and exact linear-time algorithm whose computational complexity is independent of the filtering kernel size; thus, it can be applied to real time image processing tasks. The experimental results demonstrate the effectiveness of the new filter for various computer vision applications, including noise reduction, detail smoothing and enhancement, high dynamic range compression, and flash/no-flash denoising.
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2009-12-01
Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
Development of High-speed Visualization System of Hypocenter Data Using CUDA-based GPU computing
NASA Astrophysics Data System (ADS)
Kumagai, T.; Okubo, K.; Uchida, N.; Matsuzawa, T.; Kawada, N.; Takeuchi, N.
2014-12-01
After the Great East Japan Earthquake on March 11, 2011, intelligent visualization of seismic information is becoming important to understand the earthquake phenomena. On the other hand, to date, the quantity of seismic data becomes enormous as a progress of high accuracy observation network; we need to treat many parameters (e.g., positional information, origin time, magnitude, etc.) to efficiently display the seismic information. Therefore, high-speed processing of data and image information is necessary to handle enormous amounts of seismic data. Recently, GPU (Graphic Processing Unit) is used as an acceleration tool for data processing and calculation in various study fields. This movement is called GPGPU (General Purpose computing on GPUs). In the last few years the performance of GPU keeps on improving rapidly. GPU computing gives us the high-performance computing environment at a lower cost than before. Moreover, use of GPU has an advantage of visualization of processed data, because GPU is originally architecture for graphics processing. In the GPU computing, the processed data is always stored in the video memory. Therefore, we can directly write drawing information to the VRAM on the video card by combining CUDA and the graphics API. In this study, we employ CUDA and OpenGL and/or DirectX to realize full-GPU implementation. This method makes it possible to write drawing information to the VRAM on the video card without PCIe bus data transfer: It enables the high-speed processing of seismic data. The present study examines the GPU computing-based high-speed visualization and the feasibility for high-speed visualization system of hypocenter data.
Real-time interactive 3D computer stereography for recreational applications
NASA Astrophysics Data System (ADS)
Miyazawa, Atsushi; Ishii, Motonaga; Okuzawa, Kazunori; Sakamoto, Ryuuichi
2008-02-01
With the increasing calculation costs of 3D computer stereography, low-cost, high-speed implementation of the latter requires effective distribution of computing resources. In this paper, we attempt to re-classify 3D display technologies on the basis of humans' 3D perception, in order to determine what level of presence or reality is required in recreational video game systems. We then discuss the design and implementation of stereography systems in two categories of the new classification.
An Object-Oriented Approach to Writing Computational Electromagnetics Codes
NASA Technical Reports Server (NTRS)
Zimmerman, Martin; Mallasch, Paul G.
1996-01-01
Presently, most computer software development in the Computational Electromagnetics (CEM) community employs the structured programming paradigm, particularly using the Fortran language. Other segments of the software community began switching to an Object-Oriented Programming (OOP) paradigm in recent years to help ease design and development of highly complex codes. This paper examines design of a time-domain numerical analysis CEM code using the OOP paradigm, comparing OOP code and structured programming code in terms of software maintenance, portability, flexibility, and speed.
Numerical simulation of supersonic inlets using a three-dimensional viscous flow analysis
NASA Technical Reports Server (NTRS)
Anderson, B. H.; Towne, C. E.
1980-01-01
A three dimensional fully viscous computer analysis was evaluated to determine its usefulness in the design of supersonic inlets. This procedure takes advantage of physical approximations to limit the high computer time and storage associated with complete Navier-Stokes solutions. Computed results are presented for a Mach 3.0 supersonic inlet with bleed and a Mach 7.4 hypersonic inlet. Good agreement was obtained between theory and data for both inlets. Results of a mesh sensitivity study are also shown.
The development of an interim generalized gate logic software simulator
NASA Technical Reports Server (NTRS)
Mcgough, J. G.; Nemeroff, S.
1985-01-01
A proof-of-concept computer program called IGGLOSS (Interim Generalized Gate Logic Software Simulator) was developed and is discussed. The simulator engine was designed to perform stochastic estimation of self test coverage (fault-detection latency times) of digital computers or systems. A major attribute of the IGGLOSS is its high-speed simulation: 9.5 x 1,000,000 gates/cpu sec for nonfaulted circuits and 4.4 x 1,000,000 gates/cpu sec for faulted circuits on a VAX 11/780 host computer.
Automation of Precise Time Reference Stations (PTRS)
NASA Astrophysics Data System (ADS)
Wheeler, P. J.
1985-04-01
The U.S. Naval Observatory is presently engaged in a program of automating precise time stations (PTS) and precise time reference stations (PTBS) by using a versatile mini-computer controlled data acquisition system (DAS). The data acquisition system is configured to monitor locally available PTTI signals such as LORAN-C, OMEGA, and/or the Global Positioning System. In addition, the DAS performs local standard intercomparison. Computer telephone communications provide automatic data transfer to the Naval Observatory. Subsequently, after analysis of the data, results and information can be sent back to the precise time reference station to provide automatic control of remote station timing. The DAS configuration is designed around state of the art standard industrial high reliability modules. The system integration and software are standardized but allow considerable flexibility to satisfy special local requirements such as stability measurements, performance evaluation and printing of messages and certificates. The DAS operates completely independently and may be queried or controlled at any time with a computer or terminal device (control is protected for use by authorized personnel only). Such DAS equipped PTS are operational in Hawaii, California, Texas and Florida.
Ren, Qiang; Nagar, Jogender; Kang, Lei; Bian, Yusheng; Werner, Ping; Werner, Douglas H
2017-05-18
A highly efficient numerical approach for simulating the wideband optical response of nano-architectures comprised of Drude-Critical Points (DCP) media (e.g., gold and silver) is proposed and validated through comparing with commercial computational software. The kernel of this algorithm is the subdomain level discontinuous Galerkin time domain (DGTD) method, which can be viewed as a hybrid of the spectral-element time-domain method (SETD) and the finite-element time-domain (FETD) method. An hp-refinement technique is applied to decrease the Degrees-of-Freedom (DoFs) and computational requirements. The collocated E-J scheme facilitates solving the auxiliary equations by converting the inversions of matrices to simpler vector manipulations. A new hybrid time stepping approach, which couples the Runge-Kutta and Newmark methods, is proposed to solve the temporal auxiliary differential equations (ADEs) with a high degree of efficiency. The advantages of this new approach, in terms of computational resource overhead and accuracy, are validated through comparison with well-known commercial software for three diverse cases, which cover both near-field and far-field properties with plane wave and lumped port sources. The presented work provides the missing link between DCP dispersive models and FETD and/or SETD based algorithms. It is a competitive candidate for numerically studying the wideband plasmonic properties of DCP media.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyna, David; Betty, Rita
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis,thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities. The purpose of this project was to computationally model the impact of neural population dynamics within the neurobiological memory system in order to examine how subareas in the brain enable pattern separation and completion of information in memory across time as associated experiences.