Failure detection in high-performance clusters and computers using chaotic map computations
Rao, Nageswara S.
2015-09-01
A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.
Code of Federal Regulations, 2014 CFR
2014-07-01
... Secretary, has waived certain requirements of the Computer Matching and Privacy Protection Act of 1988, 5 U... process known as centralized salary offset computer matching, identify Federal employees who owe delinquent nontax debt to the United States. Centralized salary offset computer matching is the computerized...
ERIC Educational Resources Information Center
Stifle, Jack
The PLATO IV computer-based instructional system consists of a large scale centrally located CDC 6400 computer and a large number of remote student terminals. This is a brief and general description of the proposed input/output hardware necessary to interface the student terminals with the computer's central processing unit (CPU) using available…
Method and apparatus for fault tolerance
NASA Technical Reports Server (NTRS)
Masson, Gerald M. (Inventor); Sullivan, Gregory F. (Inventor)
1993-01-01
A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period.
Everything You Always Wanted to Know about Computers but Were Afraid to Ask.
ERIC Educational Resources Information Center
DiSpezio, Michael A.
1989-01-01
An overview of the basics of computers is presented. Definitions and discussions of processing, programs, memory, DOS, anatomy and design, central processing unit (CPU), disk drives, floppy disks, and peripherals are included. This article was designed to help teachers to understand basic computer terminology. (CW)
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-07
....usda.gov . SUPPLEMENTARY INFORMATION: A. Background A proposed rule was published in the Federal.... Computers or other technical equipment means central processing units, laptops, desktops, computer mouses...
Fundamental Fortran for Social Scientists.
ERIC Educational Resources Information Center
Veldman, Donald J.
An introduction to Fortran programming specifically for social science statistical and routine data processing is provided. The first two sections of the manual describe the components of computer hardware and software. Topics include input, output, and mass storage devices; central memory; central processing unit; internal storage of data; and…
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Data processing for water monitoring system
NASA Technical Reports Server (NTRS)
Monford, L.; Linton, A. T.
1978-01-01
Water monitoring data acquisition system is structured about central computer that controls sampling and sensor operation, and analyzes and displays data in real time. Unit is essentially separated into two systems: computer system, and hard wire backup system which may function separately or with computer.
Faster, Better, Cheaper: A Decade of PC Progress.
ERIC Educational Resources Information Center
Crawford, Walt
1997-01-01
Reviews the development of personal computers and how computer components have changed in price and value. Highlights include disk drives; keyboards; displays; memory; color graphics; modems; CPU (central processing unit); storage; direct mail vendors; and future possibilities. (LRW)
Multiple-User, Multitasking, Virtual-Memory Computer System
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1993-01-01
Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
Jargon that Computes: Today's PC Terminology.
ERIC Educational Resources Information Center
Crawford, Walt
1997-01-01
Discusses PC (personal computer) and telecommunications terminology in context: Integrated Services Digital Network (ISDN); Asymmetric Digital Subscriber Line (ADSL); cable modems; satellite downloads; T1 and T3 lines; magnitudes ("giga-,""nano-"); Central Processing Unit (CPU); Random Access Memory (RAM); Universal Serial Bus…
ERIC Educational Resources Information Center
Anderson, Cheryl A.
Designed to answer basic questions educators have about microcomputer hardware and software and their applications in teaching, this paper describes the revolution in computer technology that has resulted from the development of the microchip processor and provides information on the major computer components; i.e.; input, central processing unit,…
COMPUTER PROGRAM FOR CALCULATING THE COST OF DRINKING WATER TREATMENT SYSTEMS
This FORTRAN computer program calculates the construction and operation/maintenance costs for 45 centralized unit treatment processes for water supply. The calculated costs are based on various design parameters and raw water quality. These cost data are applicable to small size ...
Characterizing Crowd Participation and Productivity of Foldit Through Web Scraping
2016-03-01
Berkeley Open Infrastructure for Network Computing CDF Cumulative Distribution Function CPU Central Processing Unit CSSG Crowdsourced Serious Game...computers at once can create a similar capacity. According to Anderson [6], principal investigator for the Berkeley Open Infrastructure for Network...extraterrestrial life. From this project, a software-based distributed computing platform called the Berkeley Open Infrastructure for Network Computing
41 CFR 105-56.024 - Purpose and scope.
Code of Federal Regulations, 2010 CFR
2010-07-01
... offset computer matching, identify Federal employees who owe delinquent non-tax debt to the United States. Centralized salary offset computer matching is the computerized comparison of delinquent debt records with...) administrative offset program, to collect delinquent debts owed to the Federal Government. This process is known...
Micromagnetics on high-performance workstation and mobile computational platforms
NASA Astrophysics Data System (ADS)
Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.
2015-05-01
The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.
Seeing the forest for the trees: Networked workstations as a parallel processing computer
NASA Technical Reports Server (NTRS)
Breen, J. O.; Meleedy, D. M.
1992-01-01
Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Code of Federal Regulations, 2010 CFR
2010-01-01
... SECURITY INFORMATION § 403.9 Fees. The following specific fees shall be applicable with respect to services... records, per hour or fraction thereof: (i) Professional $11.00 (ii) Clerical 6.00 (b) Computer service charges per second for actual use of computer central processing unit .25 (c) Copies made by photostat or...
Code of Federal Regulations, 2012 CFR
2012-01-01
... SECURITY INFORMATION § 403.9 Fees. The following specific fees shall be applicable with respect to services... records, per hour or fraction thereof: (i) Professional $11.00 (ii) Clerical 6.00 (b) Computer service charges per second for actual use of computer central processing unit .25 (c) Copies made by photostat or...
Code of Federal Regulations, 2011 CFR
2011-01-01
... SECURITY INFORMATION § 403.9 Fees. The following specific fees shall be applicable with respect to services... records, per hour or fraction thereof: (i) Professional $11.00 (ii) Clerical 6.00 (b) Computer service charges per second for actual use of computer central processing unit .25 (c) Copies made by photostat or...
Code of Federal Regulations, 2013 CFR
2013-01-01
... SECURITY INFORMATION § 403.9 Fees. The following specific fees shall be applicable with respect to services... records, per hour or fraction thereof: (i) Professional $11.00 (ii) Clerical 6.00 (b) Computer service charges per second for actual use of computer central processing unit .25 (c) Copies made by photostat or...
Code of Federal Regulations, 2014 CFR
2014-01-01
... SECURITY INFORMATION § 403.9 Fees. The following specific fees shall be applicable with respect to services... records, per hour or fraction thereof: (i) Professional $11.00 (ii) Clerical 6.00 (b) Computer service charges per second for actual use of computer central processing unit .25 (c) Copies made by photostat or...
Documentary table-top view of a comparison of the General Purpose Computers.
1988-09-13
S88-47513 (Aug 1988) --- The current and future versions of general purpose computers for Space Shuttle orbiters are represented in this frame. The two boxes on the left (AP101B) represent the current GPC configuration, with the input-output processor at far left and the central processing unit at its side. The upgraded version combines both elements in a single unit (far right, AP101S).
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines
2014-11-01
architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges
Quinzio, Lorenzo; Blazek, Michael; Hartmann, Bernd; Röhrig, Rainer; Wille, Burkhard; Junger, Axel; Hempelmann, Gunter
2005-01-01
Computers are becoming increasingly visible in operating rooms (OR) and intensive care units (ICU) for use in bedside documentation. Recently, they have been suspected as possibly acting as reservoirs for microorganisms and vehicles for the transfer of pathogens to patients, causing nosocomial infections. The purpose of this study was to examine the microbiological (bacteriological and mycological) contamination of the central unit of computers used in an OR, a surgical and a pediatric ICU of a tertiary teaching hospital. Sterile swab samples were taken from five sites in each of 13 computers stationed at the two ICUs and 12 computers at the OR. Sample sites within the chassis housing of the computer processing unit (CPU) included the CPU fan, ventilator, and metal casing. External sites were the ventilator and the bottom of the computer tower. Quantitative and qualitative microbiological analyses were performed according to commonly used methods. One hundred and ninety sites were cultured for bacteria and fungi. Analyses of swabs taken at five equivalent sites inside and outside the computer chassis did not find any significant-number of potentially pathogenic bacteria or fungi. This can probably be attributed to either the absence or the low number of pathogens detected on the surfaces. Microbial contamination in the CPU of OR and ICU computers is too low for designating them as a reservoir for microorganisms.
Exploiting graphics processing units for computational biology and bioinformatics.
Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H
2010-09-01
Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
Progress in a novel architecture for high performance processing
NASA Astrophysics Data System (ADS)
Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin
2018-04-01
The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-01-01
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-02-12
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
Graphics processing units in bioinformatics, computational biology and systems biology.
Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela
2017-09-01
Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
24 CFR 15.110 - What fees will HUD charge?
Code of Federal Regulations, 2013 CFR
2013-04-01
... duplicating machinery. The computer run time includes the cost of operating a central processing unit for that... Applies. (6) Computer run time (includes only mainframe search time not printing) The direct cost of... estimated fee is more than $250.00 or you have a history of failing to pay FOIA fees to HUD in a timely...
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Hyper-Spectral Synthesis of Active OB Stars Using GLaDoS
NASA Astrophysics Data System (ADS)
Hill, N. R.; Townsend, R. H. D.
2016-11-01
In recent years there has been considerable interest in using graphics processing units (GPUs) to perform scientific computations that have traditionally been handled by central processing units (CPUs). However, there is one area where the scientific potential of GPUs has been overlooked - computer graphics, the task they were originally designed for. Here we introduce GLaDoS, a hyper-spectral code which leverages the graphics capabilities of GPUs to synthesize spatially and spectrally resolved images of complex stellar systems. We demonstrate how GLaDoS can be applied to calculate observables for various classes of stars including systems with inhomogenous surface temperatures and contact binaries.
Wilson, J Adam; Williams, Justin C
2009-01-01
The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Accelerating sino-atrium computer simulations with graphic processing units.
Zhang, Hong; Xiao, Zheng; Lin, Shien-fong
2015-01-01
Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.
Shi, Yulin; Veidenbaum, Alexander V; Nicolau, Alex; Xu, Xiangmin
2015-01-15
Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post hoc processing and analysis. Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22× speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. Copyright © 2014 Elsevier B.V. All rights reserved.
Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin
2014-01-01
Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633
Proceedings of the First International Symposium on Explosive Detection Technology
1992-05-01
of these per unit volume is C, = 2.0 ct /cm 3/sec. For the 10 metals make the activity concentration significantly seconds that the bag is in the...Hopfield Network CI Chemical Ionization CL Chemiluminescence CLD Chemiluminescence Detector CPU Central Processing Unit CT Computed Tomography CW...NOTICE This document is disseminated under the sponsorship of the U.S. Department of Transportation in the Interest of Information exchange. The United
32 CFR 701.54 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2014 CFR
2014-07-01
...) Computer search is based on the total cost of the central processing unit, input-output devices, and memory... charge for office copy up to six images)—$3.50 Each additional image—$ .10 Each typewritten page—$3.50...
32 CFR 701.54 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2010 CFR
2010-07-01
...) Computer search is based on the total cost of the central processing unit, input-output devices, and memory... charge for office copy up to six images)—$3.50 Each additional image—$ .10 Each typewritten page—$3.50...
Space-Shuttle Emulator Software
NASA Technical Reports Server (NTRS)
Arnold, Scott; Askew, Bill; Barry, Matthew R.; Leigh, Agnes; Mermelstein, Scott; Owens, James; Payne, Dan; Pemble, Jim; Sollinger, John; Thompson, Hiram;
2007-01-01
A package of software has been developed to execute a raw binary image of the space shuttle flight software for simulation of the computational effects of operation of space shuttle avionics. This software can be run on inexpensive computer workstations. Heretofore, it was necessary to use real flight computers to perform such tests and simulations. The package includes a program that emulates the space shuttle orbiter general- purpose computer [consisting of a central processing unit (CPU), input/output processor (IOP), master sequence controller, and buscontrol elements]; an emulator of the orbiter display electronics unit and models of the associated cathode-ray tubes, keyboards, and switch controls; computational models of the data-bus network; computational models of the multiplexer-demultiplexer components; an emulation of the pulse-code modulation master unit; an emulation of the payload data interleaver; a model of the master timing unit; a model of the mass memory unit; and a software component that ensures compatibility of telemetry and command services between the simulated space shuttle avionics and a mission control center. The software package is portable to several host platforms.
Grace: A cross-platform micromagnetic simulator on graphics processing units
NASA Astrophysics Data System (ADS)
Zhu, Ru
2015-12-01
A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2013-05-23
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
A spacecraft computer repairable via command.
NASA Technical Reports Server (NTRS)
Fimmel, R. O.; Baker, T. E.
1971-01-01
The MULTIPAC is a central data system developed for deep-space probes with the distinctive feature that it may be repaired during flight via command and telemetry links by reprogramming around the failed unit. The computer organization uses pools of identical modules which the program organizes into one or more computers called processors. The interaction of these modules is dynamically controlled by the program rather than hardware. In the event of a failure, new programs are entered which reorganize the central data system with a somewhat reduced total processing capability aboard the spacecraft. Emphasis is placed on the evolution of the system architecture and the final overall system design rather than the specific logic design.
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
Shorebird Migration Patterns in Response to Climate Change: A Modeling Approach
NASA Technical Reports Server (NTRS)
Smith, James A.
2010-01-01
The availability of satellite remote sensing observations at multiple spatial and temporal scales, coupled with advances in climate modeling and information technologies offer new opportunities for the application of mechanistic models to predict how continental scale bird migration patterns may change in response to environmental change. In earlier studies, we explored the phenotypic plasticity of a migratory population of Pectoral sandpipers by simulating the movement patterns of an ensemble of 10,000 individual birds in response to changes in stopover locations as an indicator of the impacts of wetland loss and inter-annual variability on the fitness of migratory shorebirds. We used an individual based, biophysical migration model, driven by remotely sensed land surface data, climate data, and biological field data. Mean stop-over durations and stop-over frequency with latitude predicted from our model for nominal cases were consistent with results reported in the literature and available field data. In this study, we take advantage of new computing capabilities enabled by recent GP-GPU computing paradigms and commodity hardware (general purchase computing on graphics processing units). Several aspects of our individual based (agent modeling) approach lend themselves well to GP-GPU computing. We have been able to allocate compute-intensive tasks to the graphics processing units, and now simulate ensembles of 400,000 birds at varying spatial resolutions along the central North American flyway. We are incorporating additional, species specific, mechanistic processes to better reflect the processes underlying bird phenotypic plasticity responses to different climate change scenarios in the central U.S.
Heterogeneous real-time computing in radio astronomy
NASA Astrophysics Data System (ADS)
Ford, John M.; Demorest, Paul; Ransom, Scott
2010-07-01
Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
1994-04-01
TSW-7A, AIR TRAFFIC CONTROL CENTRAL (ATCC) 32- 8 AN/TTC-41(V), CENTRAL OFFICE, TELEPHONE, AUTOMATIC 32- 9 MISSILE COUNTERMEASURE DEVICE (MCD) .- 0 MK...a Handheld Terminal Unit (HTU), Portable Computer Unit (PCU), Transportable Computer Unit (TCU), and compatible NOI peripheral devices . All but the...CLASSIFICATION: ASARC-III, Jun 80, Standard. I I I AN/TIC-39 IS A MOBILE , AUTOMATIC , MODULAR ELECTRONIC CIRCUIT SWITCH UNDER PROCESSOR CONTROL WITH INTEGRAL
Badal, Andreu; Badano, Aldo
2009-11-01
It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Optimized 4-bit Quantum Reversible Arithmetic Logic Unit
NASA Astrophysics Data System (ADS)
Ayyoub, Slimani; Achour, Benslama
2017-08-01
Reversible logic has received a great attention in the recent years due to its ability to reduce the power dissipation. The main purposes of designing reversible logic are to decrease quantum cost, depth of the circuits and the number of garbage outputs. The arithmetic logic unit (ALU) is an important part of central processing unit (CPU) as the execution unit. This paper presents a complete design of a new reversible arithmetic logic unit (ALU) that can be part of a programmable reversible computing device such as a quantum computer. The proposed ALU based on a reversible low power control unit and small performance parameters full adder named double Peres gates. The presented ALU can produce the largest number (28) of arithmetic and logic functions and have the smallest number of quantum cost and delay compared with existing designs.
GPU-accelerated computational tool for studying the effectiveness of asteroid disruption techniques
NASA Astrophysics Data System (ADS)
Zimmerman, Ben J.; Wie, Bong
2016-10-01
This paper presents the development of a new Graphics Processing Unit (GPU) accelerated computational tool for asteroid disruption techniques. Numerical simulations are completed using the high-order spectral difference (SD) method. Due to the compact nature of the SD method, it is well suited for implementation with the GPU architecture, hence solutions are generated at orders of magnitude faster than the Central Processing Unit (CPU) counterpart. A multiphase model integrated with the SD method is introduced, and several asteroid disruption simulations are conducted, including kinetic-energy impactors, multi-kinetic energy impactor systems, and nuclear options. Results illustrate the benefits of using multi-kinetic energy impactor systems when compared to a single impactor system. In addition, the effectiveness of nuclear options is observed.
The PLATO IV Communications System.
ERIC Educational Resources Information Center
Sherwood, Bruce Arne; Stifle, Jack
The PLATO IV computer-based educational system contains its own communications hardware and software for operating plasma-panel graphics terminals. Key echoing is performed by the central processing unit: every key pressed at a terminal passes through the entire system before anything appears on the terminal's screen. Each terminal is guaranteed…
12 CFR 1402.22 - Fees to be charged.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Banks and Banking FARM CREDIT SYSTEM INSURANCE CORPORATION RELEASING INFORMATION Fees for Provision of...) (i.e., basic pay plus 16 percent of that rate) of the employee(s) making the search. (c) Computer... the cost of operating the central processing unit for that portion of operating time that is directly...
12 CFR 1402.22 - Fees to be charged.
Code of Federal Regulations, 2013 CFR
2013-01-01
... Banks and Banking FARM CREDIT SYSTEM INSURANCE CORPORATION RELEASING INFORMATION Fees for Provision of...) (i.e., basic pay plus 16 percent of that rate) of the employee(s) making the search. (c) Computer... the cost of operating the central processing unit for that portion of operating time that is directly...
12 CFR 1402.22 - Fees to be charged.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Banks and Banking FARM CREDIT SYSTEM INSURANCE CORPORATION RELEASING INFORMATION Fees for Provision of...) (i.e., basic pay plus 16 percent of that rate) of the employee(s) making the search. (c) Computer... the cost of operating the central processing unit for that portion of operating time that is directly...
12 CFR 1402.22 - Fees to be charged.
Code of Federal Regulations, 2014 CFR
2014-01-01
... Banks and Banking FARM CREDIT SYSTEM INSURANCE CORPORATION RELEASING INFORMATION Fees for Provision of...) (i.e., basic pay plus 16 percent of that rate) of the employee(s) making the search. (c) Computer... the cost of operating the central processing unit for that portion of operating time that is directly...
12 CFR 1402.22 - Fees to be charged.
Code of Federal Regulations, 2010 CFR
2010-01-01
... Banks and Banking FARM CREDIT SYSTEM INSURANCE CORPORATION RELEASING INFORMATION Fees for Provision of...) (i.e., basic pay plus 16 percent of that rate) of the employee(s) making the search. (c) Computer... the cost of operating the central processing unit for that portion of operating time that is directly...
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
NASA Astrophysics Data System (ADS)
Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.
2017-06-01
Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Defining and Enforcing Hardware Security Requirements
2011-12-01
Computer-Aided Design CPU Central Processing Unit CTL Computation Tree Logic DARPA The Defense Advanced Projects Research Agency DFF D-type Flip-Flop DNF...They too have no global knowledge of what is going on, nor any meaning to attach to any bit, whether storage or gating . . . it is we who attach...This option is prohibitively ex- pensive with the current trends in the global distribution of the steps in IC design and fabrication. The second option
Symplectic multi-particle tracking on GPUs
NASA Astrophysics Data System (ADS)
Liu, Zhicong; Qiang, Ji
2018-05-01
A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.
Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J
2015-07-05
We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
A perioperative echocardiographic reporting and recording system.
Pybus, David A
2004-11-01
Advances in video capture, compression, and streaming technology, coupled with improvements in central processing unit design and the inclusion of a database engine in the Windows operating system, have simplified the task of implementing a digital echocardiographic recording system. I describe an application that uses these technologies and runs on a notebook computer.
Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer
NASA Technical Reports Server (NTRS)
Godoy, William F.; Liu, Xu
2011-01-01
General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
21 CFR 1305.24 - Central processing of orders.
Code of Federal Regulations, 2013 CFR
2013-04-01
... or more registered locations and maintains a central processing computer system in which orders are... order with all linked records on the central computer system. (b) A company that has central processing... the company owns and operates. ...
21 CFR 1305.24 - Central processing of orders.
Code of Federal Regulations, 2012 CFR
2012-04-01
... or more registered locations and maintains a central processing computer system in which orders are... order with all linked records on the central computer system. (b) A company that has central processing... the company owns and operates. ...
21 CFR 1305.24 - Central processing of orders.
Code of Federal Regulations, 2011 CFR
2011-04-01
... or more registered locations and maintains a central processing computer system in which orders are... order with all linked records on the central computer system. (b) A company that has central processing... the company owns and operates. ...
21 CFR 1305.24 - Central processing of orders.
Code of Federal Regulations, 2010 CFR
2010-04-01
... or more registered locations and maintains a central processing computer system in which orders are... order with all linked records on the central computer system. (b) A company that has central processing... the company owns and operates. ...
21 CFR 1305.24 - Central processing of orders.
Code of Federal Regulations, 2014 CFR
2014-04-01
... or more registered locations and maintains a central processing computer system in which orders are... order with all linked records on the central computer system. (b) A company that has central processing... the company owns and operates. ...
Parallel-Processing Software for Correlating Stereo Images
NASA Technical Reports Server (NTRS)
Klimeck, Gerhard; Deen, Robert; Mcauley, Michael; DeJong, Eric
2007-01-01
A computer program implements parallel- processing algorithms for cor relating images of terrain acquired by stereoscopic pairs of digital stereo cameras on an exploratory robotic vehicle (e.g., a Mars rove r). Such correlations are used to create three-dimensional computatio nal models of the terrain for navigation. In this program, the scene viewed by the cameras is segmented into subimages. Each subimage is assigned to one of a number of central processing units (CPUs) opera ting simultaneously.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Badal, Andreu; Badano, Aldo
Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-raymore » imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.« less
NASA Technical Reports Server (NTRS)
Chawner, David M.; Gomez, Ray J.
2010-01-01
In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.
Code of Federal Regulations, 2010 CFR
2010-07-01
... requirements of the Computer Matching and Privacy Protection Act of 1988, 5 U.S.C. 552a, as amended, for... known as centralized salary offset computer matching, identify Federal employees who owe delinquent nontax debt to the United States. Centralized salary offset computer matching is the computerized...
Code of Federal Regulations, 2011 CFR
2011-07-01
... requirements of the Computer Matching and Privacy Protection Act of 1988, 5 U.S.C. 552a, as amended, for... known as centralized salary offset computer matching, identify Federal employees who owe delinquent nontax debt to the United States. Centralized salary offset computer matching is the computerized...
Code of Federal Regulations, 2012 CFR
2012-07-01
... requirements of the Computer Matching and Privacy Protection Act of 1988, 5 U.S.C. 552a, as amended, for... known as centralized salary offset computer matching, identify Federal employees who owe delinquent nontax debt to the United States. Centralized salary offset computer matching is the computerized...
Code of Federal Regulations, 2013 CFR
2013-07-01
... requirements of the Computer Matching and Privacy Protection Act of 1988, 5 U.S.C. 552a, as amended, for... known as centralized salary offset computer matching, identify Federal employees who owe delinquent nontax debt to the United States. Centralized salary offset computer matching is the computerized...
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units
NASA Astrophysics Data System (ADS)
Kemal, Jonathan Yashar
For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Radio Frequency Propagation and Performance Assessment Suite (RFPPAS)
2016-11-15
Intelligence, Surveillance, and Reconnaissance Clutter-to-Noise Ratio Central Processing Unit Evaporation Duct Climatology Engineer’s Refractive Effects...and maximum trapped wavelength (right) PCS display ...23 12. AREPS surface layer (evaporation duct) climatology regions...evaporation duct profiles computed from surface layer climatological statistics. The impetus for building such a database is to provide a means for instant
NASA Technical Reports Server (NTRS)
Hall, William A.
1990-01-01
Slave microprocessors in multimicroprocessor computing system contains modified circuit cards programmed via bus connecting master processor with slave microprocessors. Enables interactive, microprocessor-based, single-loop control. Confers ability to load and run program from master/slave bus, without need for microprocessor development station. Tristate buffers latch all data and information on status. Slave central processing unit never connected directly to bus.
32 CFR 286.30 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... hourly rates). (2) Computer search is based on the total cost of the central processing unit, input... made by Components at the following rates: (1) Minimum charge for office copy (up to six images) $3.50 (2) Each additional image .10 (3) Each typewritten page 3.50 (4) Certification and validation with...
32 CFR 286.30 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2010 CFR
2010-07-01
... hourly rates). (2) Computer search is based on the total cost of the central processing unit, input... made by Components at the following rates: (1) Minimum charge for office copy (up to six images) $3.50 (2) Each additional image .10 (3) Each typewritten page 3.50 (4) Certification and validation with...
32 CFR 286.30 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2013 CFR
2013-07-01
... hourly rates). (2) Computer search is based on the total cost of the central processing unit, input... made by Components at the following rates: (1) Minimum charge for office copy (up to six images) $3.50 (2) Each additional image .10 (3) Each typewritten page 3.50 (4) Certification and validation with...
32 CFR 286.30 - Collection of fees and fee rates for technical data.
Code of Federal Regulations, 2012 CFR
2012-07-01
... hourly rates). (2) Computer search is based on the total cost of the central processing unit, input... made by Components at the following rates: (1) Minimum charge for office copy (up to six images) $3.50 (2) Each additional image .10 (3) Each typewritten page 3.50 (4) Certification and validation with...
49 CFR 7.44 - Services performed without charge or at a reduced charge.
Code of Federal Regulations, 2013 CFR
2013-10-01
... charged to any requestor making a request under subpart C of this part for the first two hours of search... search is required two hours of search time will be considered spent when the hourly costs of operating the central processing unit used to perform the search added to the computer operator's salary cost...
49 CFR 7.44 - Services performed without charge or at a reduced charge.
Code of Federal Regulations, 2011 CFR
2011-10-01
... charged to any requestor making a request under subpart C of this part for the first two hours of search... search is required two hours of search time will be considered spent when the hourly costs of operating the central processing unit used to perform the search added to the computer operator's salary cost...
49 CFR 7.44 - Services performed without charge or at a reduced charge.
Code of Federal Regulations, 2012 CFR
2012-10-01
... charged to any requestor making a request under subpart C of this part for the first two hours of search... search is required two hours of search time will be considered spent when the hourly costs of operating the central processing unit used to perform the search added to the computer operator's salary cost...
NASA Technical Reports Server (NTRS)
Rabideau, Gregg; Chien, Steve; Knight, Russell; Schaffer, Steven; Tran, Daniel; Cichy, Benjamin; Sherwood, Robert
2006-01-01
The Automated Scheduling and Planning Environment (ASPEN) computer program has been updated to version 3.0. ASPEN is a modular, reconfigurable, application software framework for solving batch problems that involve reasoning about time, activities, states, and resources. Applications of ASPEN can include planning spacecraft missions, scheduling of personnel, and managing supply chains, inventories, and production lines. ASPEN 3.0 can be customized for a wide range of applications and for a variety of computing environments that include various central processing units and random access memories.
Memory management in genome-wide association studies
2009-01-01
Genome-wide association is a powerful tool for the identification of genes that underlie common diseases. Genome-wide association studies generate billions of genotypes and pose significant computational challenges for most users including limited computer memory. We applied a recently developed memory management tool to two analyses of North American Rheumatoid Arthritis Consortium studies and measured the performance in terms of central processing unit and memory usage. We conclude that our memory management approach is simple, efficient, and effective for genome-wide association studies. PMID:20018047
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Exploiting current-generation graphics hardware for synthetic-scene generation
NASA Astrophysics Data System (ADS)
Tanner, Michael A.; Keen, Wayne A.
2010-04-01
Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
The Research and Implementation of MUSER CLEAN Algorithm Based on OpenCL
NASA Astrophysics Data System (ADS)
Feng, Y.; Chen, K.; Deng, H.; Wang, F.; Mei, Y.; Wei, S. L.; Dai, W.; Yang, Q. P.; Liu, Y. B.; Wu, J. P.
2017-03-01
It's urgent to carry out high-performance data processing with a single machine in the development of astronomical software. However, due to the different configuration of the machine, traditional programming techniques such as multi-threading, and CUDA (Compute Unified Device Architecture)+GPU (Graphic Processing Unit) have obvious limitations in portability and seamlessness between different operation systems. The OpenCL (Open Computing Language) used in the development of MUSER (MingantU SpEctral Radioheliograph) data processing system is introduced. And the Högbom CLEAN algorithm is re-implemented into parallel CLEAN algorithm by the Python language and PyOpenCL extended package. The experimental results show that the CLEAN algorithm based on OpenCL has approximately equally operating efficiency compared with the former CLEAN algorithm based on CUDA. More important, the data processing in merely CPU (Central Processing Unit) environment of this system can also achieve high performance, which has solved the problem of environmental dependence of CUDA+GPU. Overall, the research improves the adaptability of the system with emphasis on performance of MUSER image clean computing. In the meanwhile, the realization of OpenCL in MUSER proves its availability in scientific data processing. In view of the high-performance computing features of OpenCL in heterogeneous environment, it will probably become the preferred technology in the future high-performance astronomical software development.
PAGANI Toolkit: Parallel graph-theoretical analysis package for brain network big data.
Du, Haixiao; Xia, Mingrui; Zhao, Kang; Liao, Xuhong; Yang, Huazhong; Wang, Yu; He, Yong
2018-05-01
The recent collection of unprecedented quantities of neuroimaging data with high spatial resolution has led to brain network big data. However, a toolkit for fast and scalable computational solutions is still lacking. Here, we developed the PArallel Graph-theoretical ANalysIs (PAGANI) Toolkit based on a hybrid central processing unit-graphics processing unit (CPU-GPU) framework with a graphical user interface to facilitate the mapping and characterization of high-resolution brain networks. Specifically, the toolkit provides flexible parameters for users to customize computations of graph metrics in brain network analyses. As an empirical example, the PAGANI Toolkit was applied to individual voxel-based brain networks with ∼200,000 nodes that were derived from a resting-state fMRI dataset of 624 healthy young adults from the Human Connectome Project. Using a personal computer, this toolbox completed all computations in ∼27 h for one subject, which is markedly less than the 118 h required with a single-thread implementation. The voxel-based functional brain networks exhibited prominent small-world characteristics and densely connected hubs, which were mainly located in the medial and lateral fronto-parietal cortices. Moreover, the female group had significantly higher modularity and nodal betweenness centrality mainly in the medial/lateral fronto-parietal and occipital cortices than the male group. Significant correlations between the intelligence quotient and nodal metrics were also observed in several frontal regions. Collectively, the PAGANI Toolkit shows high computational performance and good scalability for analyzing connectome big data and provides a friendly interface without the complicated configuration of computing environments, thereby facilitating high-resolution connectomics research in health and disease. © 2018 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Hur, Min Young; Verboncoeur, John; Lee, Hae June
2014-10-01
Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.
POM.gpu-v1.0: a GPU-based Princeton Ocean Model
NASA Astrophysics Data System (ADS)
Xu, S.; Huang, X.; Oey, L.-Y.; Xu, F.; Fu, H.; Zhang, Y.; Yang, G.
2015-09-01
Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.
Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.
Adamo, Mark E; Gerber, Scott A
2016-09-07
MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Distributed trace using central performance counter memory
Satterfield, David L; Sexton, James C
2013-10-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Distributed trace using central performance counter memory
Satterfield, David L.; Sexton, James C.
2013-01-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.
2012-01-01
We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616
Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R
2012-02-23
We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.
Microprocessors: the engines of the digital age
2017-01-01
The microprocessor—a computer central processing unit integrated onto a single microchip—has come to dominate computing across all of its scales from the tiniest consumer appliance to the largest supercomputer. This dominance has taken decades to achieve, but an irresistible logic made the ultimate outcome inevitable. The objectives of this Perspective paper are to offer a brief history of the development of the microprocessor and to answer questions such as: where did the microprocessor come from, where is it now, and where might it go in the future? PMID:28413353
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.
Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N
2012-11-13
The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
FastMag: Fast micromagnetic simulator for complex magnetic structures (invited)
NASA Astrophysics Data System (ADS)
Chang, R.; Li, S.; Lubarda, M. V.; Livshitz, B.; Lomakin, V.
2011-04-01
A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus, discretized superposition integrals are computed using a nonuniform grid interpolation method, which evaluates the field from N sources at N collocated observers in O(N) operations. This approach allows handling objects of arbitrary shape, allows easily calculating of the field outside the magnetized domains, does not require solving a linear system of equations, and requires little memory. FastMag is implemented on GPUs with ?> GPU-central processing unit speed-ups of 2 orders of magnitude. Simulations are shown of a large array of magnetic dots and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.
Calculating with light using a chip-scale all-optical abacus.
Feldmann, J; Stegmaier, M; Gruhler, N; Ríos, C; Bhaskaran, H; Wright, C D; Pernice, W H P
2017-11-02
Machines that simultaneously process and store multistate data at one and the same location can provide a new class of fast, powerful and efficient general-purpose computers. We demonstrate the central element of an all-optical calculator, a photonic abacus, which provides multistate compute-and-store operation by integrating functional phase-change materials with nanophotonic chips. With picosecond optical pulses we perform the fundamental arithmetic operations of addition, subtraction, multiplication, and division, including a carryover into multiple cells. This basic processing unit is embedded into a scalable phase-change photonic network and addressed optically through a two-pulse random access scheme. Our framework provides first steps towards light-based non-von Neumann arithmetic.
Internode data communications in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.
2013-09-03
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Internode data communications in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Distributed Computing with Centralized Support Works at Brigham Young.
ERIC Educational Resources Information Center
McDonald, Kelly; Stone, Brad
1992-01-01
Brigham Young University (Utah) has addressed the need for maintenance and support of distributed computing systems on campus by implementing a program patterned after a national business franchise, providing the support and training of a centralized administration but allowing each unit to operate much as an independent small business.…
High Performance GPU-Based Fourier Volume Rendering.
Abdellah, Marwan; Eldeib, Ayman; Sharawi, Amr
2015-01-01
Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of its (N (2)logN) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are (N (3)) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look like X-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.
Real-time digital holographic microscopy using the graphic processing unit.
Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi
2008-08-04
Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.
Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan
2011-05-01
Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Molecular dynamics simulations through GPU video games technologies
Loukatou, Styliani; Papageorgiou, Louis; Fakourelis, Paraskevas; Filntisi, Arianna; Polychronidou, Eleftheria; Bassis, Ioannis; Megalooikonomou, Vasileios; Makałowski, Wojciech; Vlachakis, Dimitrios; Kossida, Sophia
2016-01-01
Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements of atoms and molecules is molecular dynamics (MD). MD is an in silico method to simulate the physical motions of atoms and molecules under certain conditions. This has become a state strategic technique and now plays a key role in many areas of exact sciences, such as chemistry, biology, physics and medicine. Due to their complexity, MD calculations could require enormous amounts of computer memory and time and therefore their execution has been a big problem. Despite the huge computational cost, molecular dynamics have been implemented using traditional computers with a central memory unit (CPU). A graphics processing unit (GPU) computing technology was first designed with the goal to improve video games, by rapidly creating and displaying images in a frame buffer such as screens. The hybrid GPU-CPU implementation, combined with parallel computing is a novel technology to perform a wide range of calculations. GPUs have been proposed and used to accelerate many scientific computations including MD simulations. Herein, we describe the new methodologies developed initially as video games and how they are now applied in MD simulations. PMID:27525251
Michael Ernst
2017-12-09
As the sole Tier-1 computing facility for ATLAS in the United States and the largest ATLAS computing center worldwide Brookhaven provides a large portion of the overall computing resources for U.S. collaborators and serves as the central hub for storing,
Distributed GPU Computing in GIScience
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.
2013-12-01
Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3), 378-394. 2. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D Environmental Data Using Many-core Graphics Processing Units (GPUs) and Multi-core Central Processing Units (CPUs). Computers & Geosciences, 59(9), 78-89. 3. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879-899.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, D. H.
1985-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, David H.
1987-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
GPU-based prompt gamma ray imaging from boron neutron capture therapy.
Yoon, Do-Kun; Jung, Joo-Young; Jo Hong, Key; Sil Lee, Keum; Suk Suh, Tae
2015-01-01
The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU). Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.
Programmable partitioning for high-performance coherence domains in a multiprocessor system
Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY
2011-01-25
A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.
Field programmable gate array-assigned complex-valued computation and its limits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernard-Schwarz, Maria, E-mail: maria.bernardschwarz@ni.com; Institute of Applied Physics, TU Wien, Wiedner Hauptstrasse 8, 1040 Wien; Zwick, Wolfgang
We discuss how leveraging Field Programmable Gate Array (FPGA) technology as part of a high performance computing platform reduces latency to meet the demanding real time constraints of a quantum optics simulation. Implementations of complex-valued operations using fixed point numeric on a Virtex-5 FPGA compare favorably to more conventional solutions on a central processing unit. Our investigation explores the performance of multiple fixed point options along with a traditional 64 bits floating point version. With this information, the lowest execution times can be estimated. Relative error is examined to ensure simulation accuracy is maintained.
A radiation-hardened, computer for satellite applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaona, J.I. Jr.
1996-08-01
This paper describes high reliability radiation hardened computers built by Sandia for application aboard DOE satellite programs requiring 32 bit processing. The computers highlight a radiation hardened (10 kGy(Si)) R3000 executing up to 10 million reduced instruction set instructions (RISC) per second (MIPS), a dual purpose module control bus used for real-time default and power management which allows for extended mission operation on as little as 1.2 watts, and a local area network capable of 480 Mbits/s. The central processing unit (CPU) is the NASA Goddard R3000 nicknamed the ``Mongoose or Mongoose 1``. The Sandia Satellite Computer (SSC) uses Rational`smore » Ada compiler, debugger, operating system kernel, and enhanced floating point emulation library targeted at the Mongoose. The SSC gives Sandia the capability of processing complex types of spacecraft attitude determination and control algorithms and of modifying programmed control laws via ground command. And in general, SSC offers end users the ability to process data onboard the spacecraft that would normally have been sent to the ground which allows reconsideration of traditional space-grounded partitioning options.« less
The Photon Shell Game and the Quantum von Neumann Architecture with Superconducting Circuits
NASA Astrophysics Data System (ADS)
Mariantoni, Matteo
2012-02-01
Superconducting quantum circuits have made significant advances over the past decade, allowing more complex and integrated circuits that perform with good fidelity. We have recently implemented a machine comprising seven quantum channels, with three superconducting resonators, two phase qubits, and two zeroing registers. I will explain the design and operation of this machine, first showing how a single microwave photon | 1 > can be prepared in one resonator and coherently transferred between the three resonators. I will also show how more exotic states such as double photon states | 2 > and superposition states | 0 >+ | 1 > can be shuffled among the resonators as well [1]. I will then demonstrate how this machine can be used as the quantum-mechanical analog of the von Neumann computer architecture, which for a classical computer comprises a central processing unit and a memory holding both instructions and data. The quantum version comprises a quantum central processing unit (quCPU) that exchanges data with a quantum random-access memory (quRAM) integrated on one chip, with instructions stored on a classical computer. I will also present a proof-of-concept demonstration of a code that involves all seven quantum elements: (1), Preparing an entangled state in the quCPU, (2), writing it to the quRAM, (3), preparing a second state in the quCPU, (4), zeroing it, and, (5), reading out the first state stored in the quRAM [2]. Finally, I will demonstrate that the quantum von Neumann machine provides one unit cell of a two-dimensional qubit-resonator array that can be used for surface code quantum computing. This will allow the realization of a scalable, fault-tolerant quantum processor with the most forgiving error rates to date. [4pt] [1] M. Mariantoni et al., Nature Physics 7, 287-293 (2011.)[0pt] [2] M. Mariantoni et al., Science 334, 61-65 (2011).
Rath, N; Kato, S; Levesque, J P; Mauel, M E; Navratil, G A; Peng, Q
2014-04-01
Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.
Optimization of the coherence function estimation for multi-core central processing unit
NASA Astrophysics Data System (ADS)
Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.
2017-02-01
The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Variability and trends in runoff efficiency in the conterminous United States
McCabe, Gregory J.; Wolock, David M.
2016-01-01
Variability and trends in water-year runoff efficiency (RE) — computed as the ratio of water-year runoff (streamflow per unit area) to water-year precipitation — in the conterminous United States (CONUS) are examined for the 1951 through 2012 period. Changes in RE are analyzed using runoff and precipitation data aggregated to United States Geological Survey 8-digit hydrologic cataloging units (HUs). Results indicate increases in RE for some regions in the north-central CONUS and large decreases in RE for the south-central CONUS. The increases in RE in the north-central CONUS are explained by trends in climate, whereas the large decreases in RE in the south-central CONUS likely are related to groundwater withdrawals from the Ogallala aquifer to support irrigated agriculture.
Pelletier, Mathew G
2008-02-08
One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU) as an alternative to thePC's traditional use of the central processing unit (CPU). The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit "GPU", for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC's central processing unit "CPU", wasgained. The new parallel algorithm operating on the GPU was able to process a 1024x1024image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of realtimefeed-back control that is in tight cooperation with the cleaning equipment.
Accuracy and time requirements of a bar-code inventory system for medical supplies.
Hanson, L B; Weinswig, M H; De Muth, J E
1988-02-01
The effects of implementing a bar-code system for issuing medical supplies to nursing units at a university teaching hospital were evaluated. Data on the time required to issue medical supplies to three nursing units at a 480-bed, tertiary-care teaching hospital were collected (1) before the bar-code system was implemented (i.e., when the manual system was in use), (2) one month after implementation, and (3) four months after implementation. At the same times, the accuracy of the central supply perpetual inventory was monitored using 15 selected items. One-way analysis of variance tests were done to determine any significant differences between the bar-code and manual systems. Using the bar-code system took longer than using the manual system because of a significant difference in the time required for order entry into the computer. Multiple-use requirements of the central supply computer system made entering bar-code data a much slower process. There was, however, a significant improvement in the accuracy of the perpetual inventory. Using the bar-code system for issuing medical supplies to the nursing units takes longer than using the manual system. However, the accuracy of the perpetual inventory was significantly improved with the implementation of the bar-code system.
Programmable Direct-Memory-Access Controller
NASA Technical Reports Server (NTRS)
Hendry, David F.
1990-01-01
Proposed programmable direct-memory-access controller (DMAC) operates with computer systems of 32000 series, which have 32-bit data buses and use addresses of 24 (or potentially 32) bits. Controller functions with or without help of central processing unit (CPU) and starts itself. Includes such advanced features as ability to compare two blocks of memory for equality and to search block of memory for specific value. Made as single very-large-scale integrated-circuit chip.
GPU accelerated manifold correction method for spinning compact binaries
NASA Astrophysics Data System (ADS)
Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying
2018-04-01
The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.
Modeling and Simulation of the Economics of Mining in the Bitcoin Market.
Cocco, Luisanna; Marchesi, Michele
2016-01-01
In January 3, 2009, Satoshi Nakamoto gave rise to the "Bitcoin Blockchain", creating the first block of the chain hashing on his computer's central processing unit (CPU). Since then, the hash calculations to mine Bitcoin have been getting more and more complex, and consequently the mining hardware evolved to adapt to this increasing difficulty. Three generations of mining hardware have followed the CPU's generation. They are GPU's, FPGA's and ASIC's generations. This work presents an agent-based artificial market model of the Bitcoin mining process and of the Bitcoin transactions. The goal of this work is to model the economy of the mining process, starting from GPU's generation, the first with economic significance. The model reproduces some "stylized facts" found in real-time price series and some core aspects of the mining business. In particular, the computational experiments performed can reproduce the unit root property, the fat tail phenomenon and the volatility clustering of Bitcoin price series. In addition, under proper assumptions, they can reproduce the generation of Bitcoins, the hashing capability, the power consumption, and the mining hardware and electrical energy expenditures of the Bitcoin network.
Computational Modeling of Arc-Slag Interaction in DC Furnaces
NASA Astrophysics Data System (ADS)
Reynolds, Quinn G.
2017-02-01
The plasma arc is central to the operation of the direct-current arc furnace, a unit operation commonly used in high-temperature processing of both primary ores and recycled metals. The arc is a high-velocity, high-temperature jet of ionized gas created and sustained by interactions among the thermal, momentum, and electromagnetic fields resulting from the passage of electric current. In addition to being the primary source of thermal energy, the arc jet also couples mechanically with the bath of molten process material within the furnace, causing substantial splashing and stirring in the region in which it impinges. The arc's interaction with the molten bath inside the furnace is studied through use of a multiphase, multiphysics computational magnetohydrodynamic model developed in the OpenFOAM® framework. Results from the computational solver are compared with empirical correlations that account for arc-slag interaction effects.
Heterogeneous scalable framework for multiphase flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morris, Karla Vanessa
2013-09-01
Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less
ConnectX2 In niBand Management Queues: New support for Network Of oaded
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Poole, Stephen W; Shamis, Pavel
2010-01-01
This paper introduces the newly developed InfiniBand (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCAmore » and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale benchmark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high-performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Poole, Stephen W; Shamis, Pavel
2010-01-01
This paper introduces the newly developed Infini-Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCAmore » and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale benchmark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.« less
Some issues related to simulation of the tracking and communications computer network
NASA Technical Reports Server (NTRS)
Lacovara, Robert C.
1989-01-01
The Communications Performance and Integration branch of the Tracking and Communications Division has an ongoing involvement in the simulation of its flight hardware for Space Station Freedom. Specifically, the communication process between central processor(s) and orbital replaceable units (ORU's) is simulated with varying degrees of fidelity. The results of investigations into three aspects of this simulation effort are given. The most general area involves the use of computer assisted software engineering (CASE) tools for this particular simulation. The second area of interest is simulation methods for systems of mixed hardware and software. The final area investigated is the application of simulation methods to one of the proposed computer network protocols for space station, specifically IEEE 802.4.
Some issues related to simulation of the tracking and communications computer network
NASA Astrophysics Data System (ADS)
Lacovara, Robert C.
1989-12-01
The Communications Performance and Integration branch of the Tracking and Communications Division has an ongoing involvement in the simulation of its flight hardware for Space Station Freedom. Specifically, the communication process between central processor(s) and orbital replaceable units (ORU's) is simulated with varying degrees of fidelity. The results of investigations into three aspects of this simulation effort are given. The most general area involves the use of computer assisted software engineering (CASE) tools for this particular simulation. The second area of interest is simulation methods for systems of mixed hardware and software. The final area investigated is the application of simulation methods to one of the proposed computer network protocols for space station, specifically IEEE 802.4.
Interactive access to forest inventory data for the South Central United States
William H. McWilliams
1990-01-01
On-line access to USDA, Forest Service successive forest inventory data for the South Central United States is provided by two computer systems. The Easy Access to Forest Inventory and Analysis Tables program (EZTAB) produces a set of tables for specific geographic areas. The Interactive Graphics and Retrieval System (INGRES) is a database management system that...
Memory interface simulator: A computer design aid
NASA Technical Reports Server (NTRS)
Taylor, D. S.; Williams, T.; Weatherbee, J. E.
1972-01-01
Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo.
McDaniel, T; D'Azevedo, E F; Li, Y W; Wong, K; Kent, P R C
2017-11-07
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo
NASA Astrophysics Data System (ADS)
McDaniel, T.; D'Azevedo, E. F.; Li, Y. W.; Wong, K.; Kent, P. R. C.
2017-11-01
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
GPU-based prompt gamma ray imaging from boron neutron capture therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoon, Do-Kun; Jung, Joo-Young; Suk Suh, Tae, E-mail: suhsanta@catholic.ac.kr
Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusions: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.« less
TU-FG-BRB-07: GPU-Based Prompt Gamma Ray Imaging From Boron Neutron Capture Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, S; Suh, T; Yoon, D
Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusion: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray reconstruction using the GPU computation for BNCT simulations.« less
GPU acceleration of Runge Kutta-Fehlberg and its comparison with Dormand-Prince method
NASA Astrophysics Data System (ADS)
Seen, Wo Mei; Gobithaasan, R. U.; Miura, Kenjiro T.
2014-07-01
There is a significant reduction of processing time and speedup of performance in computer graphics with the emergence of Graphic Processing Units (GPUs). GPUs have been developed to surpass Central Processing Unit (CPU) in terms of performance and processing speed. This evolution has opened up a new area in computing and researches where highly parallel GPU has been used for non-graphical algorithms. Physical or phenomenal simulations and modelling can be accelerated through General Purpose Graphic Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) implementations. These phenomena can be represented with mathematical models in the form of Ordinary Differential Equations (ODEs) which encompasses the gist of change rate between independent and dependent variables. ODEs are numerically integrated over time in order to simulate these behaviours. The classical Runge-Kutta (RK) scheme is the common method used to numerically solve ODEs. The Runge Kutta Fehlberg (RKF) scheme has been specially developed to provide an estimate of the principal local truncation error at each step, known as embedding estimate technique. This paper delves into the implementation of RKF scheme for GPU devices and compares its result with Dorman Prince method. A pseudo code is developed to show the implementation in detail. Hence, practitioners will be able to understand the data allocation in GPU, formation of RKF kernels and the flow of data to/from GPU-CPU upon RKF kernel evaluation. The pseudo code is then written in C Language and two ODE models are executed to show the achievable speedup as compared to CPU implementation. The accuracy and efficiency of the proposed implementation method is discussed in the final section of this paper.
Examining the architecture of cellular computing through a comparative study with a computer
Wang, Degeng; Gribskov, Michael
2005-01-01
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179
Examining the architecture of cellular computing through a comparative study with a computer.
Wang, Degeng; Gribskov, Michael
2005-06-22
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.
Extraction and visualization of the central chest lymph-node stations
NASA Astrophysics Data System (ADS)
Lu, Kongkuo; Merritt, Scott A.; Higgins, William E.
2008-03-01
Lung cancer remains the leading cause of cancer death in the United States and is expected to account for nearly 30% of all cancer deaths in 2007. Central to the lung-cancer diagnosis and staging process is the assessment of the central chest lymph nodes. This assessment typically requires two major stages: (1) location of the lymph nodes in a three-dimensional (3D) high-resolution volumetric multi-detector computed-tomography (MDCT) image of the chest; (2) subsequent nodal sampling using transbronchial needle aspiration (TBNA). We describe a computer-based system for automatically locating the central chest lymph-node stations in a 3D MDCT image. Automated analysis methods are first run that extract the airway tree, airway-tree centerlines, aorta, pulmonary artery, lungs, key skeletal structures, and major-airway labels. This information provides geometrical and anatomical cues for localizing the major nodal stations. Our system demarcates these stations, conforming to criteria outlined for the Mountain and Wang standard classification systems. Visualization tools within the system then enable the user to interact with these stations to locate visible lymph nodes. Results derived from a set of human 3D MDCT chest images illustrate the usage and efficacy of the system.
Adaptive-optics optical coherence tomography processing using a graphics processing unit.
Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T
2014-01-01
Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
The DFVLR main department for central data processing, 1976 - 1983
NASA Technical Reports Server (NTRS)
1983-01-01
Data processing, equipment and systems operation, operative and user systems, user services, computer networks and communications, text processing, computer graphics, and high power computers are discussed.
Space Tug Avionics Definition Study. Volume 5: Cost and Programmatics
NASA Technical Reports Server (NTRS)
1975-01-01
The baseline avionics system features a central digital computer that integrates the functions of all the space tug subsystems by means of a redundant digital data bus. The central computer consists of dual central processor units, dual input/output processors, and a fault tolerant memory, utilizing internal redundancy and error checking. Three electronically steerable phased arrays provide downlink transmission from any tug attitude directly to ground or via TDRS. Six laser gyros and six accelerometers in a dodecahedron configuration make up the inertial measurement unit. Both a scanning laser radar and a TV system, employing strobe lamps, are required as acquisition and docking sensors. Primary dc power at a nominal 28 volts is supplied from dual lightweight, thermally integrated fuel cells which operate from propellant grade reactants out of the main tanks.
CLOCS (Computer with Low Context-Switching Time) Architecture Reference Documents
1988-05-06
Peculiarities The only state inside the central processing unit(CPU) is a program status word. All data operations are memory to memory. One result of this... to the challenge "if I whore to design RISC, this is how I would do it." The architecture was designed by Mark Davis and Bill Gallmeister. 1.2...are memory to memory. Any special devices added should be memory mapped. The program counter is even memory mapped. 1.3.1 Working storage There is no
Surface features of central North America: a synoptic view from computer graphics
Pike, R.J.
1991-01-01
A digital shaded-relief image of the 48 contiguous United States shows the details of large- and small-scale landforms, including several linear trends. The features faithfully reflect tectonism, continental glaciation, fluvial activity, volcanism, and other surface-shaping events and processes. The new map not only depicts topography accurately and in its true complexity, but does so in one synoptic view that provides a regional context for geologic analysis unobscured by clouds, culture, vegetation, or artistic constraints. -Author
cudaMap: a GPU accelerated program for gene expression connectivity mapping.
McArt, Darragh G; Bankhead, Peter; Dunne, Philip D; Salto-Tellez, Manuel; Hamilton, Peter; Zhang, Shu-Dong
2013-10-11
Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
Wheeler, Russell L.
2014-01-01
Computation of probabilistic earthquake hazard requires an estimate of Mmax: the moment magnitude of the largest earthquake that is thought to be possible within a specified geographic region. The region specified in this report is the Central and Eastern United States and adjacent Canada. Parts A and B of this report describe the construction of a global catalog of moderate to large earthquakes that occurred worldwide in tectonic analogs of the Central and Eastern United States. Examination of histograms of the magnitudes of these earthquakes allows estimation of Central and Eastern United States Mmax. The catalog and Mmax estimates derived from it are used in the 2014 edition of the U.S. Geological Survey national seismic-hazard maps. Part A deals with prehistoric earthquakes, and this part deals with historical events.
NASA Astrophysics Data System (ADS)
Ferrando, N.; Gosálvez, M. A.; Cerdá, J.; Gadea, R.; Sato, K.
2011-03-01
Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on neighborhood states, extended simulation periods can be obtained by using simplified atomistic propagation models, such as the Cellular Automata (CA). This method, however, has an intrinsic parallel updating nature and the corresponding simulations are highly inefficient when performed on classical Central Processing Units (CPUs), which are designed for the sequential execution of tasks. In this paper, a series of guidelines is presented for the efficient adaptation of octree-based, CA simulations of complex, evolving surfaces into massively parallel computing hardware. A Graphics Processing Unit (GPU) is used as a cost-efficient example of the parallel architectures. For the actual simulations, we consider the surface propagation during anisotropic wet chemical etching of silicon as a computationally challenging process with a wide-spread use in microengineering applications. A continuous CA model that is intrinsically parallel in nature is used for the time evolution. Our study strongly indicates that parallel computations of dynamically evolving surfaces simulated using CA methods are significantly benefited by the incorporation of octrees as support data structures, substantially decreasing the overall computational time and memory usage.
NASA Technical Reports Server (NTRS)
Wallace, J. W.; Lovelady, R. W.; Ferguson, R. L.
1981-01-01
A prototype water quality monitoring system is described which offers almost continuous in situ monitoring. The two-man portable system features: (1) a microprocessor controlled central processing unit which allows preprogrammed sampling schedules and reprogramming in situ; (2) a subsurface unit for multiple depth capability and security from vandalism; (3) an acoustic data link for communications between the subsurface unit and the surface control unit; (4) eight water quality parameter sensors; (5) a nonvolatile magnetic bubble memory which prevents data loss in the event of power interruption; (6) a rechargeable power supply sufficient for 2 weeks of unattended operation; (7) a water sampler which can collect samples for laboratory analysis; (8) data output in direct engineering units on printed tape or through a computer compatible link; (9) internal electronic calibration eliminating external sensor adjustment; and (10) acoustic location and recovery systems. Data obtained in Saginaw Bay, Lake Huron are tabulated.
2017-08-01
access to the GPU for general purpose processing .5 CUDA is designed to work easily with multiple programming languages , including Fortran. CUDA is a...Using Graphics Processing Unit (GPU) Computing by Leelinda P Dawson Approved for public release; distribution unlimited...The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing by Leelinda
Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA
NASA Astrophysics Data System (ADS)
Beasley, D. G.; Marques, A. C.; Alves, L. C.; da Silva, R. C.
2013-07-01
A new 3D Proton Induced X-Ray Emission Tomography (PIXE-T) and Scanning Transmission Ion Microscopy Tomography (STIM-T) simulation software has been developed in Java and uses NVIDIA™ Common Unified Device Architecture (CUDA) to calculate the X-ray attenuation for large detector areas. A challenge with PIXE-T is to get sufficient counts while retaining a small beam spot size. Therefore a high geometric efficiency is required. However, as the detector solid angle increases the calculations required for accurate reconstruction of the data increase substantially. To overcome this limitation, the CUDA parallel computing platform was used which enables general purpose programming of NVIDIA graphics processing units (GPUs) to perform computations traditionally handled by the central processing unit (CPU). For simulation performance evaluation, the results of a CPU- and a CUDA-based simulation of a phantom are presented. Furthermore, a comparison with the simulation code in the PIXE-Tomography reconstruction software DISRA (A. Sakellariou, D.N. Jamieson, G.J.F. Legge, 2001) is also shown. Compared to a CPU implementation, the CUDA based simulation is approximately 30× faster.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Job-mix modeling and system analysis of an aerospace multiprocessor.
NASA Technical Reports Server (NTRS)
Mallach, E. G.
1972-01-01
An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.
General-purpose interface bus for multiuser, multitasking computer system
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1990-01-01
The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
Subino, Janice A.; Forde, Arnell S.; Dadisman, Shawn V.; Wiese, Dana S.; Calderon, Karynna
2012-01-01
This Digital Versatile Disc (DVD) publication was prepared by an agency of the United States Government. Although these data have been processed successfully on a computer system at the U.S. Geological Survey, no warranty expressed or implied is made regarding the display or utility of the data on any other system, nor shall the act of distribution imply any such warranty. The U.S. Geological Survey shall not be held liable for improper or incorrect use of the data described and (or) contained herein. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.
Advantages of GPU technology in DFT calculations of intercalated graphene
NASA Astrophysics Data System (ADS)
Pešić, J.; Gajić, R.
2014-09-01
Over the past few years, the expansion of general-purpose graphic-processing unit (GPGPU) technology has had a great impact on computational science. GPGPU is the utilization of a graphics-processing unit (GPU) to perform calculations in applications usually handled by the central processing unit (CPU). Use of GPGPUs as a way to increase computational power in the material sciences has significantly decreased computational costs in already highly demanding calculations. A level of the acceleration and parallelization depends on the problem itself. Some problems can benefit from GPU acceleration and parallelization, such as the finite-difference time-domain algorithm (FTDT) and density-functional theory (DFT), while others cannot take advantage of these modern technologies. A number of GPU-supported applications had emerged in the past several years (www.nvidia.com/object/gpu-applications.html). Quantum Espresso (QE) is reported as an integrated suite of open source computer codes for electronic-structure calculations and materials modeling at the nano-scale. It is based on DFT, the use of a plane-waves basis and a pseudopotential approach. Since the QE 5.0 version, it has been implemented as a plug-in component for standard QE packages that allows exploiting the capabilities of Nvidia GPU graphic cards (www.qe-forge.org/gf/proj). In this study, we have examined the impact of the usage of GPU acceleration and parallelization on the numerical performance of DFT calculations. Graphene has been attracting attention worldwide and has already shown some remarkable properties. We have studied an intercalated graphene, using the QE package PHonon, which employs GPU. The term ‘intercalation’ refers to a process whereby foreign adatoms are inserted onto a graphene lattice. In addition, by intercalating different atoms between graphene layers, it is possible to tune their physical properties. Our experiments have shown there are benefits from using GPUs, and we reached an acceleration of several times compared to standard CPU calculations.
NASA Technical Reports Server (NTRS)
1997-01-01
Small Business Innovation Research contracts from Goddard Space Flight Center to Thermacore Inc. have fostered the company work on devices tagged "heat pipes" for space application. To control the extreme temperature ranges in space, heat pipes are important to spacecraft. The problem was to maintain an 8-watt central processing unit (CPU) at less than 90 C in a notebook computer using no power, with very little space available and without using forced convection. Thermacore's answer was in the design of a powder metal wick that transfers CPU heat from a tightly confined spot to an area near available air flow. The heat pipe technology permits a notebook computer to be operated in any position without loss of performance. Miniature heat pipe technology has successfully been applied, such as in Pentium Processor notebook computers. The company expects its heat pipes to accommodate desktop computers as well. Cellular phones, camcorders, and other hand-held electronics are forsible applications for heat pipes.
Distributed and recoverable digital control system
NASA Technical Reports Server (NTRS)
Stange, Kent (Inventor); Hess, Richard (Inventor); Kelley, Gerald B (Inventor); Rogers, Randy (Inventor)
2010-01-01
A real-time multi-tasking digital control system with rapid recovery capability is disclosed. The control system includes a plurality of computing units comprising a plurality of redundant processing units, with each of the processing units configured to generate one or more redundant control commands. One or more internal monitors are employed for detecting data errors in the control commands. One or more recovery triggers are provided for initiating rapid recovery of a processing unit if data errors are detected. The control system also includes a plurality of actuator control units each in operative communication with the computing units. The actuator control units are configured to initiate a rapid recovery if data errors are detected in one or more of the processing units. A plurality of smart actuators communicates with the actuator control units, and a plurality of redundant sensors communicates with the computing units.
AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble.
Abdul-Wahid, Badi'; Feng, Haoyun; Rajan, Dinesh; Costaouec, Ronan; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2014-10-27
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
2015-01-01
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854
SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).
Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J
2012-06-01
To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.
Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.; ...
2017-10-14
We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.
We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
[Groupamatic 360 C1 and automated blood donor processing in a transfusion center].
Guimbretiere, J; Toscer, M; Harousseau, H
1978-03-01
Automation of donor management flow path is controlled by: --a 3 slip "port a punch" card, --the groupamatic unit with a result sorted out on punch paper tape, --the management computer off line connected to groupamatic. Data tracking at blood collection time is made by punching a card with the donor card used as a master card. Groupamatic performs: --a standard blood grouping with one run for registered donors and two runs for new donors, --a phenotyping with two runs, --a screening of irregular antibodies. Themanagement computer checks the correlation between the data of the two runs or the data of a single run and that of previous file. It updates the data resident in the central file and prints out: --the controls of the different blood group for the red cell panel, --The listing of error messages, --The listing of emergency call up, --The listing of collected blood units when arrived at the blood center, with quantitative and qualitative information such as: number of blood, units collected, donor addresses, etc., --Statistics, --Donor cards, --Diplomas.
Real, Kevin; Fay, Lindsey; Isaacs, Kathy; Carll-White, Allison; Schadler, Aric
2018-01-01
This study utilizes systems theory to understand how changes to physical design structures impact communication processes and patient and staff design-related outcomes. Many scholars and researchers have noted the importance of communication and teamwork for patient care quality. Few studies have examined changes to nursing station design within a systems theory framework. This study employed a multimethod, before-and-after, quasi-experimental research design. Nurses completed surveys in centralized units and later in decentralized units ( N = 26 pre , N = 51 post ). Patients completed surveys ( N = 62 pre ) in centralized units and later in decentralized units ( N = 49 post ). Surveys included quantitative measures and qualitative open-ended responses. Patients preferred the decentralized units because of larger single-occupancy rooms, greater privacy/confidentiality, and overall satisfaction with design. Nurses had a more complex response. Nurses approved the patient rooms, unit environment, and noise levels in decentralized units. However, they reported reduced access to support spaces, lower levels of team/mentoring communication, and less satisfaction with design than in centralized units. Qualitative findings supported these results. Nurses were more positive about centralized units and patients were more positive toward decentralized units. The results of this study suggest a need to understand how system components operate in concert. A major contribution of this study is the inclusion of patient satisfaction with design, an important yet overlooked fact in patient satisfaction. Healthcare design researchers and practitioners may consider how changing system interdependencies can lead to unexpected changes to communication processes and system outcomes in complex systems.
Integrating Micro-computers with a Centralized DBMS: ORACLE, SEED AND INGRES
NASA Technical Reports Server (NTRS)
Hoerger, J.
1984-01-01
Users of ADABAS, a relational-like data base management system (ADABAS) with its data base programming language (NATURAL) are acquiring microcomputers with hopes of solving their individual word processing, office automation, decision support, and simple data processing problems. As processor speeds, memory sizes, and disk storage capacities increase, individual departments begin to maintain "their own" data base on "their own" micro-computer. This situation can adversely affect several of the primary goals set for implementing a centralized DBMS. In order to avoid this potential problem, these micro-computers must be integrated with the centralized DBMS. An easy to use and flexible means for transferring logic data base files between the central data base machine and micro-computers must be provided. Some of the problems encounted in an effort to accomplish this integration and possible solutions are discussed.
Electric prototype power processor for a 30cm ion thruster
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.; Schoenfeld, A. D.
1977-01-01
An electrical prototype power processor unit was designed, fabricated and tested with a 30 cm mercury ion engine for primary space propulsion. The power processor unit used the thyristor series resonant inverter as the basic power stage for the high power beam and discharge supplies. A transistorized series resonant inverter processed the remaining power for the low power outputs. The power processor included a digital interface unit to process all input commands and internal telemetry signals so that electric propulsion systems could be operated with a central computer system. The electrical prototype unit included design improvement in the power components such as thyristors, transistors, filters and resonant capacitors, and power transformers and inductors in order to reduce component weight, to minimize losses, and to control the component temperature rise. A design analysis for the electrical prototype is also presented on the component weight, losses, part count and reliability estimate. The electrical prototype was tested in a thermal vacuum environment. Integration tests were performed with a 30 cm ion engine and demonstrated operational compatibility. Electromagnetic interference data was also recorded on the design to provide information for spacecraft integration.
NASA Astrophysics Data System (ADS)
Sudarmaji; Rudianto, Indra; Eka Nurcahya, Budi
2018-04-01
A strong tectonic earthquake with a magnitude of 5.9 Richter scale has been occurred in Yogyakarta and Central Java on May 26, 2006. The earthquake has caused severe damage in Yogyakarta and the southern part of Central Java, Indonesia. The understanding of seismic response of earthquake among ground shaking and the level of building damage is important. We present numerical modeling of 3D seismic wave propagation around Yogyakarta and the southern part of Central Java using spectral-element method on MPI-GPU (Graphics Processing Unit) computer cluster to observe its seismic response due to the earthquake. The homogeneous 3D realistic model is generated with detailed topography surface. The influences of free surface topography and layer discontinuity of the 3D model among the seismic response are observed. The seismic wave field is discretized using spectral-element method. The spectral-element method is solved on a mesh of hexahedral elements that is adapted to the free surface topography and the internal discontinuity of the model. To increase the data processing capabilities, the simulation is performed on a GPU cluster with implementation of MPI (Message Passing Interface).
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fong, K. W.
1977-08-15
This report deals with some techniques in applied programming using the Livermore Timesharing System (LTSS) on the CDC 7600 computers at the National Magnetic Fusion Energy Computer Center (NMFECC) and the Lawrence Livermore Laboratory Computer Center (LLLCC or Octopus network). This report is based on a document originally written specifically about the system as it is implemented at NMFECC but has been revised to accommodate differences between LLLCC and NMFECC implementations. Topics include: maintaining programs, debugging, recovering from system crashes, and using the central processing unit, memory, and input/output devices efficiently and economically. Routines that aid in these procedures aremore » mentioned. The companion report, UCID-17556, An LTSS Compendium, discusses the hardware and operating system and should be read before reading this report.« less
Multispectral imaging system for contaminant detection
NASA Technical Reports Server (NTRS)
Poole, Gavin H. (Inventor)
2003-01-01
An automated inspection system for detecting digestive contaminants on food items as they are being processed for consumption includes a conveyor for transporting the food items, a light sealed enclosure which surrounds a portion of the conveyor, with a light source and a multispectral or hyperspectral digital imaging camera disposed within the enclosure. Operation of the conveyor, light source and camera are controlled by a central computer unit. Light reflected by the food items within the enclosure is detected in predetermined wavelength bands, and detected intensity values are analyzed to detect the presence of digestive contamination.
NASA Astrophysics Data System (ADS)
Nebashi, Ryusuke; Sakimura, Noboru; Sugibayashi, Tadahiko
2017-08-01
We evaluated the soft-error tolerance and energy consumption of an embedded computer with magnetic random access memory (MRAM) using two computer simulators. One is a central processing unit (CPU) simulator of a typical embedded computer system. We simulated the radiation-induced single-event-upset (SEU) probability in a spin-transfer-torque MRAM cell and also the failure rate of a typical embedded computer due to its main memory SEU error. The other is a delay tolerant network (DTN) system simulator. It simulates the power dissipation of wireless sensor network nodes of the system using a revised CPU simulator and a network simulator. We demonstrated that the SEU effect on the embedded computer with 1 Gbit MRAM-based working memory is less than 1 failure in time (FIT). We also demonstrated that the energy consumption of the DTN sensor node with MRAM-based working memory can be reduced to 1/11. These results indicate that MRAM-based working memory enhances the disaster tolerance of embedded computers.
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.
2012-01-01
Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610
NASA Astrophysics Data System (ADS)
Kim, Jeongnim; Baczewski, Andrew D.; Beaudet, Todd D.; Benali, Anouar; Chandler Bennett, M.; Berrill, Mark A.; Blunt, Nick S.; Josué Landinez Borda, Edgar; Casula, Michele; Ceperley, David M.; Chiesa, Simone; Clark, Bryan K.; Clay, Raymond C., III; Delaney, Kris T.; Dewing, Mark; Esler, Kenneth P.; Hao, Hongxia; Heinonen, Olle; Kent, Paul R. C.; Krogel, Jaron T.; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M. Graham; Luo, Ye; Malone, Fionn D.; Martin, Richard M.; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A.; Mitas, Lubos; Morales, Miguel A.; Neuscamman, Eric; Parker, William D.; Pineda Flores, Sergio D.; Romero, Nichols A.; Rubenstein, Brenda M.; Shea, Jacqueline A. R.; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F.; Townsend, Joshua P.; Tubman, Norm M.; Van Der Goetz, Brett; Vincent, Jordan E.; ChangMo Yang, D.; Yang, Yubo; Zhang, Shuai; Zhao, Luning
2018-05-01
QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater–Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
Kim, Jeongnim; Baczewski, Andrew T; Beaudet, Todd D; Benali, Anouar; Bennett, M Chandler; Berrill, Mark A; Blunt, Nick S; Borda, Edgar Josué Landinez; Casula, Michele; Ceperley, David M; Chiesa, Simone; Clark, Bryan K; Clay, Raymond C; Delaney, Kris T; Dewing, Mark; Esler, Kenneth P; Hao, Hongxia; Heinonen, Olle; Kent, Paul R C; Krogel, Jaron T; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M Graham; Luo, Ye; Malone, Fionn D; Martin, Richard M; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A; Mitas, Lubos; Morales, Miguel A; Neuscamman, Eric; Parker, William D; Pineda Flores, Sergio D; Romero, Nichols A; Rubenstein, Brenda M; Shea, Jacqueline A R; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F; Townsend, Joshua P; Tubman, Norm M; Van Der Goetz, Brett; Vincent, Jordan E; Yang, D ChangMo; Yang, Yubo; Zhang, Shuai; Zhao, Luning
2018-05-16
QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program's capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
Serial network simplifies the design of multiple microcomputer systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Folkes, D.
1981-01-01
Recently there has been a lot of interest in developing network communication schemes for carrying digital data between locally distributed computing stations. Many of these schemes have focused on distributed networking techniques for data processing applications. These applications suggest the use of a serial, multipoint bus, where a number of remote intelligent units act as slaves to a central or host computer. Each slave would be serially addressable from the host and would perform required operations upon being addressed by the host. Based on an MK3873 single-chip microcomputer, the SCU 20 is designed to be such a remote slave device.more » The capabilities of the SCU 20 and its use in systems applications are examined.« less
Numerical simulation of disperse particle flows on a graphics processing unit
NASA Astrophysics Data System (ADS)
Sierakowski, Adam J.
In both nature and technology, we commonly encounter solid particles being carried within fluid flows, from dust storms to sediment erosion and from food processing to energy generation. The motion of uncountably many particles in highly dynamic flow environments characterizes the tremendous complexity of such phenomena. While methods exist for the full-scale numerical simulation of such systems, current computational capabilities require the simplification of the numerical task with significant approximation using closure models widely recognized as insufficient. There is therefore a fundamental need for the investigation of the underlying physical processes governing these disperse particle flows. In the present work, we develop a new tool based on the Physalis method for the first-principles numerical simulation of thousands of particles (a small fraction of an entire disperse particle flow system) in order to assist in the search for new reduced-order closure models. We discuss numerous enhancements to the efficiency and stability of the Physalis method, which introduces the influence of spherical particles to a fixed-grid incompressible Navier-Stokes flow solver using a local analytic solution to the flow equations. Our first-principles investigation demands the modeling of unresolved length and time scales associated with particle collisions. We introduce a collision model alongside Physalis, incorporating lubrication effects and proposing a new nonlinearly damped Hertzian contact model. By reproducing experimental studies from the literature, we document extensive validation of the methods. We discuss the implementation of our methods for massively parallel computation using a graphics processing unit (GPU). We combine Eulerian grid-based algorithms with Lagrangian particle-based algorithms to achieve computational throughput up to 90 times faster than the legacy implementation of Physalis for a single central processing unit. By avoiding all data communication between the GPU and the host system during the simulation, we utilize with great efficacy the GPU hardware with which many high performance computing systems are currently equipped. We conclude by looking forward to the future of Physalis with multi-GPU parallelization in order to perform resolved disperse flow simulations of more than 100,000 particles and further advance the development of reduced-order closure models.
Computational and Pharmacological Target of Neurovascular Unit for Drug Design and Delivery
2015-01-01
The blood-brain barrier (BBB) is a dynamic and highly selective permeable interface between central nervous system (CNS) and periphery that regulates the brain homeostasis. Increasing evidences of neurological disorders and restricted drug delivery process in brain make BBB as special target for further study. At present, neurovascular unit (NVU) is a great interest and highlighted topic of pharmaceutical companies for CNS drug design and delivery approaches. Some recent advancement of pharmacology and computational biology makes it convenient to develop drugs within limited time and affordable cost. In this review, we briefly introduce current understanding of the NVU, including molecular and cellular composition, physiology, and regulatory function. We also discuss the recent technology and interaction of pharmacogenomics and bioinformatics for drug design and step towards personalized medicine. Additionally, we develop gene network due to understand NVU associated transporter proteins interactions that might be effective for understanding aetiology of neurological disorders and new target base protective therapies development and delivery. PMID:26579539
Real-time colour hologram generation based on ray-sampling plane with multi-GPU acceleration.
Sato, Hirochika; Kakue, Takashi; Ichihashi, Yasuyuki; Endo, Yutaka; Wakunami, Koki; Oi, Ryutaro; Yamamoto, Kenji; Nakayama, Hirotaka; Shimobaba, Tomoyoshi; Ito, Tomoyoshi
2018-01-24
Although electro-holography can reconstruct three-dimensional (3D) motion pictures, its computational cost is too heavy to allow for real-time reconstruction of 3D motion pictures. This study explores accelerating colour hologram generation using light-ray information on a ray-sampling (RS) plane with a graphics processing unit (GPU) to realise a real-time holographic display system. We refer to an image corresponding to light-ray information as an RS image. Colour holograms were generated from three RS images with resolutions of 2,048 × 2,048; 3,072 × 3,072 and 4,096 × 4,096 pixels. The computational results indicate that the generation of the colour holograms using multiple GPUs (NVIDIA Geforce GTX 1080) was approximately 300-500 times faster than those generated using a central processing unit. In addition, the results demonstrate that 3D motion pictures were successfully reconstructed from RS images of 3,072 × 3,072 pixels at approximately 15 frames per second using an electro-holographic reconstruction system in which colour holograms were generated from RS images in real time.
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
cudaMap: a GPU accelerated program for gene expression connectivity mapping
2013-01-01
Background Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. Results cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Conclusion Emerging ‘omics’ technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap. PMID:24112435
Common Readout Unit (CRU) - A new readout architecture for the ALICE experiment
NASA Astrophysics Data System (ADS)
Mitra, J.; Khan, S. A.; Mukherjee, S.; Paul, R.
2016-03-01
The ALICE experiment at the CERN Large Hadron Collider (LHC) is presently going for a major upgrade in order to fully exploit the scientific potential of the upcoming high luminosity run, scheduled to start in the year 2021. The high interaction rate and the large event size will result in an experimental data flow of about 1 TB/s from the detectors, which need to be processed before sending to the online computing system and data storage. This processing is done in a dedicated Common Readout Unit (CRU), proposed for data aggregation, trigger and timing distribution and control moderation. It act as common interface between sub-detector electronic systems, computing system and trigger processors. The interface links include GBT, TTC-PON and PCIe. GBT (Gigabit transceiver) is used for detector data payload transmission and fixed latency path for trigger distribution between CRU and detector readout electronics. TTC-PON (Timing, Trigger and Control via Passive Optical Network) is employed for time multiplex trigger distribution between CRU and Central Trigger Processor (CTP). PCIe (Peripheral Component Interconnect Express) is the high-speed serial computer expansion bus standard for bulk data transport between CRU boards and processors. In this article, we give an overview of CRU architecture in ALICE, discuss the different interfaces, along with the firmware design and implementation of CRU on the LHCb PCIe40 board.
ERIC Educational Resources Information Center
Hermann, Jeffrey T.
1992-01-01
Every college should have a campuswide computer publishing policy. Policy options include (1) centralized publishing; (2) franchise, with a number of units doing their own publishing; or (3) limited, with the publications office producing the basics and assisting other units when feasible. To be successful, the policy must also be enforced. (MSE)
Introduction to the LaRC central scientific computing complex
NASA Technical Reports Server (NTRS)
Shoosmith, John N.
1993-01-01
The computers and associated equipment that make up the Central Scientific Computing Complex of the Langley Research Center are briefly described. The electronic networks that provide access to the various components of the complex and a number of areas that can be used by Langley and contractors staff for special applications (scientific visualization, image processing, software engineering, and grid generation) are also described. Flight simulation facilities that use the central computers are described. Management of the complex, procedures for its use, and available services and resources are discussed. This document is intended for new users of the complex, for current users who wish to keep appraised of changes, and for visitors who need to understand the role of central scientific computers at Langley.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units
NASA Astrophysics Data System (ADS)
Corral, Roque; Gisbert, Fernando; Pueblas, Jesus
2017-02-01
The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
A GPS-based Real-time Road Traffic Monitoring System
NASA Astrophysics Data System (ADS)
Tanti, Kamal Kumar
In recent years, monitoring systems are astonishingly inclined towards ever more automatic; reliably interconnected, distributed and autonomous operation. Specifically, the measurement, logging, data processing and interpretation activities may be carried out by separate units at different locations in near real-time. The recent evolution of mobile communication devices and communication technologies has fostered a growing interest in the GIS & GPS-based location-aware systems and services. This paper describes a real-time road traffic monitoring system based on integrated mobile field devices (GPS/GSM/IOs) working in tandem with advanced GIS-based application software providing on-the-fly authentications for real-time monitoring and security enhancement. The described system is developed as a fully automated, continuous, real-time monitoring system that employs GPS sensors and Ethernet and/or serial port communication techniques are used to transfer data between GPS receivers at target points and a central processing computer. The data can be processed locally or remotely based on the requirements of client’s satisfaction. Due to the modular architecture of the system, other sensor types may be supported with minimal effort. Data on the distributed network & measurements are transmitted via cellular SIM cards to a Control Unit, which provides for post-processing and network management. The Control Unit may be remotely accessed via an Internet connection. The new system will not only provide more consistent data about the road traffic conditions but also will provide methods for integrating with other Intelligent Transportation Systems (ITS). For communication between the mobile device and central monitoring service GSM technology is used. The resulting system is characterized by autonomy, reliability and a high degree of automation.
NASA Astrophysics Data System (ADS)
Peter, Daniel; Videau, Brice; Pouget, Kevin; Komatitsch, Dimitri
2015-04-01
Improving the resolution of tomographic images is crucial to answer important questions on the nature of Earth's subsurface structure and internal processes. Seismic tomography is the most prominent approach where seismic signals from ground-motion records are used to infer physical properties of internal structures such as compressional- and shear-wave speeds, anisotropy and attenuation. Recent advances in regional- and global-scale seismic inversions move towards full-waveform inversions which require accurate simulations of seismic wave propagation in complex 3D media, providing access to the full 3D seismic wavefields. However, these numerical simulations are computationally very expensive and need high-performance computing (HPC) facilities for further improving the current state of knowledge. During recent years, many-core architectures such as graphics processing units (GPUs) have been added to available large HPC systems. Such GPU-accelerated computing together with advances in multi-core central processing units (CPUs) can greatly accelerate scientific applications. There are mainly two possible choices of language support for GPU cards, the CUDA programming environment and OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted mainly by AMD graphic cards. In order to employ such hardware accelerators for seismic wave propagation simulations, we incorporated a code generation tool BOAST into an existing spectral-element code package SPECFEM3D_GLOBE. This allows us to use meta-programming of computational kernels and generate optimized source code for both CUDA and OpenCL languages, running simulations on either CUDA or OpenCL hardware accelerators. We show here applications of forward and adjoint seismic wave propagation on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Towards real-time photon Monte Carlo dose calculation in the cloud
NASA Astrophysics Data System (ADS)
Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-01
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.
Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-07
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
GPU-based RFA simulation for minimally invasive cancer treatment of liver tumours.
Mariappan, Panchatcharam; Weir, Phil; Flanagan, Ronan; Voglreiter, Philip; Alhonnoro, Tuomas; Pollari, Mika; Moche, Michael; Busse, Harald; Futterer, Jurgen; Portugaller, Horst Rupert; Sequeiros, Roberto Blanco; Kolesnik, Marina
2017-01-01
Radiofrequency ablation (RFA) is one of the most popular and well-standardized minimally invasive cancer treatments (MICT) for liver tumours, employed where surgical resection has been contraindicated. Less-experienced interventional radiologists (IRs) require an appropriate planning tool for the treatment to help avoid incomplete treatment and so reduce the tumour recurrence risk. Although a few tools are available to predict the ablation lesion geometry, the process is computationally expensive. Also, in our implementation, a few patient-specific parameters are used to improve the accuracy of the lesion prediction. Advanced heterogeneous computing using personal computers, incorporating the graphics processing unit (GPU) and the central processing unit (CPU), is proposed to predict the ablation lesion geometry. The most recent GPU technology is used to accelerate the finite element approximation of Penne's bioheat equation and a three state cell model. Patient-specific input parameters are used in the bioheat model to improve accuracy of the predicted lesion. A fast GPU-based RFA solver is developed to predict the lesion by doing most of the computational tasks in the GPU, while reserving the CPU for concurrent tasks such as lesion extraction based on the heat deposition at each finite element node. The solver takes less than 3 min for a treatment duration of 26 min. When the model receives patient-specific input parameters, the deviation between real and predicted lesion is below 3 mm. A multi-centre retrospective study indicates that the fast RFA solver is capable of providing the IR with the predicted lesion in the short time period before the intervention begins when the patient has been clinically prepared for the treatment.
Magnetohydrodynamics with GAMER
NASA Astrophysics Data System (ADS)
Zhang, Ui-Han; Schive, Hsi-Yu; Chiueh, Tzihong
2018-06-01
GAMER, a parallel Graphic-processing-unit-accelerated Adaptive-MEsh-Refinement (AMR) hydrodynamic code, has been extended to support magnetohydrodynamics (MHD) with both the corner-transport-upwind and MUSCL-Hancock schemes and the constraint transport technique. The divergent preserving operator for AMR has been applied to reinforce the divergence-free constraint on the magnetic field. GAMER-MHD has fully exploited the concurrent executions between the graphic process unit (GPU) MHD solver and other central processing unit computation pertinent to AMR. We perform various standard tests to demonstrate that GAMER-MHD is both second-order accurate and robust, producing results as accurate as those given by high-resolution uniform-grid runs. We also explore a new 3D MHD test, where the magnetic field assumes the Arnold–Beltrami–Childress configuration, temporarily becomes turbulent with current sheets, and finally settles to a lowest-energy equilibrium state. This 3D problem is adopted for the performance test of GAMER-MHD. The single-GPU performance reaches 1.2 × 108 and 5.5 × 107 cell updates per second for the single- and double-precision calculations, respectively, on Tesla P100. We also demonstrate a parallel efficiency of ∼70% for both weak and strong scaling using 1024 XK nodes on the Blue Waters supercomputers.
Target Information Processing: A Joint Decision and Estimation Approach
2012-03-29
ground targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important...targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important
Data Acquisition System for Multi-Frequency Radar Flight Operations Preparation
NASA Technical Reports Server (NTRS)
Leachman, Jonathan
2010-01-01
A three-channel data acquisition system was developed for the NASA Multi-Frequency Radar (MFR) system. The system is based on a commercial-off-the-shelf (COTS) industrial PC (personal computer) and two dual-channel 14-bit digital receiver cards. The decimated complex envelope representations of the three radar signals are passed to the host PC via the PCI bus, and then processed in parallel by multiple cores of the PC CPU (central processing unit). The innovation is this parallelization of the radar data processing using multiple cores of a standard COTS multi-core CPU. The data processing portion of the data acquisition software was built using autonomous program modules or threads, which can run simultaneously on different cores. A master program module calculates the optimal number of processing threads, launches them, and continually supplies each with data. The benefit of this new parallel software architecture is that COTS PCs can be used to implement increasingly complex processing algorithms on an increasing number of radar range gates and data rates. As new PCs become available with higher numbers of CPU cores, the software will automatically utilize the additional computational capacity.
NASA Technical Reports Server (NTRS)
Ross, C.; Williams, G. P. W., Jr.
1975-01-01
The functional design of a preprocessor, and subsystems is described. A structure chart and a data flow diagram are included for each subsystem. Also a group of intermodule interface definitions (one definition per module) is included immediately following the structure chart and data flow for a particular subsystem. Each of these intermodule interface definitions consists of the identification of the module, the function the module is to perform, the identification and definition of parameter interfaces to the module, and any design notes associated with the module. Also described are compilers and computer libraries.
A Low Cost Microcomputer Laboratory for Investigating Computer Architecture.
ERIC Educational Resources Information Center
Mitchell, Eugene E., Ed.
1980-01-01
Described is a microcomputer laboratory at the United States Military Academy at West Point, New York, which provides easy access to non-volatile memory and a single input/output file system for 16 microcomputer laboratory positions. A microcomputer network that has a centralized data base is implemented using the concepts of computer network…
Method of up-front load balancing for local memory parallel processors
NASA Technical Reports Server (NTRS)
Baffes, Paul Thomas (Inventor)
1990-01-01
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Using technology to develop and distribute patient education storyboards across a health system.
Kisak, Anne Z; Conrad, Kathryn J
2004-01-01
To describe the successful implementation of a centrally designed and managed patient education storyboard project using Microsoft PowerPoint in a large multihospital system and physician-based practice settings. Journal articles, project evaluation, and clinical and educational experience. The use of posters, bulletin boards, and storyboards as educational strategies has been reported widely. Two multidisciplinary committees applied new technology to develop storyboards for patient, family, and general public education. Technology can be used to coordinate centralized development of patient education posters, improving accuracy and content of patient education across a healthcare system while streamlining the development and review process and avoiding duplication of work effort. Storyboards are excellent sources of unit-based current, consistent patient education; reduce duplication of efforts; enhance nursing computer competencies; market nursing expertise; and promote nurse educators.
Zhmurov, A; Dima, R I; Kholodov, Y; Barsegov, V
2010-11-01
Theoretical exploration of fundamental biological processes involving the forced unraveling of multimeric proteins, the sliding motion in protein fibers and the mechanical deformation of biomolecular assemblies under physiological force loads is challenging even for distributed computing systems. Using a C(α)-based coarse-grained self organized polymer (SOP) model, we implemented the Langevin simulations of proteins on graphics processing units (SOP-GPU program). We assessed the computational performance of an end-to-end application of the program, where all the steps of the algorithm are running on a GPU, by profiling the simulation time and memory usage for a number of test systems. The ∼90-fold computational speedup on a GPU, compared with an optimized central processing unit program, enabled us to follow the dynamics in the centisecond timescale, and to obtain the force-extension profiles using experimental pulling speeds (v(f) = 1-10 μm/s) employed in atomic force microscopy and in optical tweezers-based dynamic force spectroscopy. We found that the mechanical molecular response critically depends on the conditions of force application and that the kinetics and pathways for unfolding change drastically even upon a modest 10-fold increase in v(f). This implies that, to resolve accurately the free energy landscape and to relate the results of single-molecule experiments in vitro and in silico, molecular simulations should be carried out under the experimentally relevant force loads. This can be accomplished in reasonable wall-clock time for biomolecules of size as large as 10(5) residues using the SOP-GPU package. © 2010 Wiley-Liss, Inc.
Kinetic energy budget studies of areas of convection
NASA Technical Reports Server (NTRS)
Fuelberg, H. E.
1979-01-01
Synoptic-scale kinetic energy budgets are being computed for three cases when large areas of intense convection occurred over the Central United States. Major energy activity occurs in the storm areas.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Power processor for a 30cm ion thruster
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.
1974-01-01
A thermal vacuum power processor for the NASA Lewis 30cm Mercury Ion Engine was designed, fabricated and tested to determine compliance with electrical specifications. The power processor breadboard used the silicon controlled rectifier (SCR) series resonant inverter as the basic power stage to process all the power to an ion engine. The power processor includes a digital interface unit to process all input commands and internal telemetry signals so that operation is compatible with a central computer system. The breadboard was tested in a thermal vacuum environment. Integration tests were performed with the ion engine and demonstrate operational compatibility and reliable operation without any component failures. Electromagnetic interference data were also recorded on the design to provide information on the interaction with total spacecraft.
Ising Processing Units: Potential and Challenges for Discrete Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coffrin, Carleton James; Nagarajan, Harsha; Bent, Russell Whitford
The recent emergence of novel computational devices, such as adiabatic quantum computers, CMOS annealers, and optical parametric oscillators, presents new opportunities for hybrid-optimization algorithms that leverage these kinds of specialized hardware. In this work, we propose the idea of an Ising processing unit as a computational abstraction for these emerging tools. Challenges involved in using and bench- marking these devices are presented, and open-source software tools are proposed to address some of these challenges. The proposed benchmarking tools and methodology are demonstrated by conducting a baseline study of established solution methods to a D-Wave 2X adiabatic quantum computer, one examplemore » of a commercially available Ising processing unit.« less
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; Russell, Samuel S.
2012-01-01
Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
Carroll, Carlos; McRae, Brad H; Brookes, Allen
2012-02-01
Centrality metrics evaluate paths between all possible pairwise combinations of sites on a landscape to rank the contribution of each site to facilitating ecological flows across the network of sites. Computational advances now allow application of centrality metrics to landscapes represented as continuous gradients of habitat quality. This avoids the binary classification of landscapes into patch and matrix required by patch-based graph analyses of connectivity. It also avoids the focus on delineating paths between individual pairs of core areas characteristic of most corridor- or linkage-mapping methods of connectivity analysis. Conservation of regional habitat connectivity has the potential to facilitate recovery of the gray wolf (Canis lupus), a species currently recolonizing portions of its historic range in the western United States. We applied 3 contrasting linkage-mapping methods (shortest path, current flow, and minimum-cost-maximum-flow) to spatial data representing wolf habitat to analyze connectivity between wolf populations in central Idaho and Yellowstone National Park (Wyoming). We then applied 3 analogous betweenness centrality metrics to analyze connectivity of wolf habitat throughout the northwestern United States and southwestern Canada to determine where it might be possible to facilitate range expansion and interpopulation dispersal. We developed software to facilitate application of centrality metrics. Shortest-path betweenness centrality identified a minimal network of linkages analogous to those identified by least-cost-path corridor mapping. Current flow and minimum-cost-maximum-flow betweenness centrality identified diffuse networks that included alternative linkages, which will allow greater flexibility in planning. Minimum-cost-maximum-flow betweenness centrality, by integrating both land cost and habitat capacity, allows connectivity to be considered within planning processes that seek to maximize species protection at minimum cost. Centrality analysis is relevant to conservation and landscape genetics at a range of spatial extents, but it may be most broadly applicable within single- and multispecies planning efforts to conserve regional habitat connectivity. ©2011 Society for Conservation Biology.
NASA Astrophysics Data System (ADS)
Ghosh, Amal K.; Bhattacharya, Animesh; Raul, Moumita; Basuray, Amitabha
2012-07-01
Arithmetic logic unit (ALU) is the most important unit in any computing system. Optical computing is becoming popular day-by-day because of its ultrahigh processing speed and huge data handling capability. Obviously for the fast processing we need the optical TALU compatible with the multivalued logic. In this regard we are communicating the trinary arithmetic and logic unit (TALU) in modified trinary number (MTN) system, which is suitable for the optical computation and other applications in multivalued logic system. Here the savart plate and spatial light modulator (SLM) based optoelectronic circuits have been used to exploit the optical tree architecture (OTA) in optical interconnection network.
NASA Astrophysics Data System (ADS)
Dettmer, J.; Quijano, J. E.; Dosso, S. E.; Holland, C. W.; Mandolesi, E.
2016-12-01
Geophysical seabed properties are important for the detection and classification of unexploded ordnance. However, current surveying methods such as vertical seismic profiling, coring, or inversion are of limited use when surveying large areas with high spatial sampling density. We consider surveys based on a source and receiver array towed by an autonomous vehicle which produce large volumes of seabed reflectivity data that contain unprecedented and detailed seabed information. The data are analyzed with a particle filter, which requires efficient reflection-coefficient computation, efficient inversion algorithms and efficient use of computer resources. The filter quantifies information content of multiple sequential data sets by considering results from previous data along the survey track to inform the importance sampling at the current point. Challenges arise from environmental changes along the track where the number of sediment layers and their properties change. This is addressed by a trans-dimensional model in the filter which allows layering complexity to change along a track. Efficiency is improved by likelihood tempering of various particle subsets and including exchange moves (parallel tempering). The filter is implemented on a hybrid computer that combines central processing units (CPUs) and graphics processing units (GPUs) to exploit three levels of parallelism: (1) fine-grained parallel computation of spherical reflection coefficients with a GPU implementation of Levin integration; (2) updating particles by concurrent CPU processes which exchange information using automatic load balancing (coarse grained parallelism); (3) overlapping CPU-GPU communication (a major bottleneck) with GPU computation by staggering CPU access to the multiple GPUs. The algorithm is applied to spherical reflection coefficients for data sets along a 14-km track on the Malta Plateau, Mediterranean Sea. We demonstrate substantial efficiency gains over previous methods. [This research was supported in part by the U.S. Dept of Defense, thought the Strategic Environmental Research and Development Program (SERDP).
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit
Lawrie, David S.
2017-01-01
Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.; ...
2018-04-19
QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.
QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
Development and acceleration of unstructured mesh-based cfd solver
NASA Astrophysics Data System (ADS)
Emelyanov, V.; Karpenko, A.; Volkov, K.
2017-06-01
The study was undertaken as part of a larger effort to establish a common computational fluid dynamics (CFD) code for simulation of internal and external flows and involves some basic validation studies. The governing equations are solved with ¦nite volume code on unstructured meshes. The computational procedure involves reconstruction of the solution in each control volume and extrapolation of the unknowns to find the flow variables on the faces of control volume, solution of Riemann problem for each face of the control volume, and evolution of the time step. The nonlinear CFD solver works in an explicit time-marching fashion, based on a three-step Runge-Kutta stepping procedure. Convergence to a steady state is accelerated by the use of geometric technique and by the application of Jacobi preconditioning for high-speed flows, with a separate low Mach number preconditioning method for use with low-speed flows. The CFD code is implemented on graphics processing units (GPUs). Speedup of solution on GPUs with respect to solution on central processing units (CPU) is compared with the use of different meshes and different methods of distribution of input data into blocks. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
NASA Astrophysics Data System (ADS)
Cai, Xiaohui; Liu, Yang; Ren, Zhiming
2018-06-01
Reverse-time migration (RTM) is a powerful tool for imaging geologically complex structures such as steep-dip and subsalt. However, its implementation is quite computationally expensive. Recently, as a low-cost solution, the graphic processing unit (GPU) was introduced to improve the efficiency of RTM. In the paper, we develop three ameliorative strategies to implement RTM on GPU card. First, given the high accuracy and efficiency of the adaptive optimal finite-difference (FD) method based on least squares (LS) on central processing unit (CPU), we study the optimal LS-based FD method on GPU. Second, we develop the CPU-based hybrid absorbing boundary condition (ABC) to the GPU-based one by addressing two issues of the former when introduced to GPU card: time-consuming and chaotic threads. Third, for large-scale data, the combinatorial strategy for optimal checkpointing and efficient boundary storage is introduced for the trade-off between memory and recomputation. To save the time of communication between host and disk, the portable operating system interface (POSIX) thread is utilized to create the other CPU core at the checkpoints. Applications of the three strategies on GPU with the compute unified device architecture (CUDA) programming language in RTM demonstrate their efficiency and validity.
NASA Astrophysics Data System (ADS)
Hu, X.; Li, X.
2012-08-01
The orthophoto is an important component of GIS database and has been applied in many fields. But occlusion and shadow causes the loss of feature information which has a great effect on the quality of images. One of the critical steps in true orthophoto generation is the detection of occlusion and shadow. Nowadays LiDAR can obtain the digital surface model (DSM) directly. Combined with this technology, image occlusion and shadow can be detected automatically. In this paper, the Z-Buffer is applied for occlusion detection. The shadow detection can be regarded as a same problem with occlusion detection considering the angle between the sun and the camera. However, the Z-Buffer algorithm is computationally expensive. And the volume of scanned data and remote sensing images is very large. Efficient algorithm is another challenge. Modern graphics processing unit (GPU) is much more powerful than central processing unit (CPU). We introduce this technology to speed up the Z-Buffer algorithm and get 7 times increase in speed compared with CPU. The results of experiments demonstrate that Z-Buffer algorithm plays well in occlusion and shadow detection combined with high density of point cloud and GPU can speed up the computation significantly.
NASA Astrophysics Data System (ADS)
Okamoto, Taro; Takenaka, Hiroshi; Nakamura, Takeshi; Aoki, Takayuki
2010-12-01
We adopted the GPU (graphics processing unit) to accelerate the large-scale finite-difference simulation of seismic wave propagation. The simulation can benefit from the high-memory bandwidth of GPU because it is a "memory intensive" problem. In a single-GPU case we achieved a performance of about 56 GFlops, which was about 45-fold faster than that achieved by a single core of the host central processing unit (CPU). We confirmed that the optimized use of fast shared memory and registers were essential for performance. In the multi-GPU case with three-dimensional domain decomposition, the non-contiguous memory alignment in the ghost zones was found to impose quite long time in data transfer between GPU and the host node. This problem was solved by using contiguous memory buffers for ghost zones. We achieved a performance of about 2.2 TFlops by using 120 GPUs and 330 GB of total memory: nearly (or more than) 2200 cores of host CPUs would be required to achieve the same performance. The weak scaling was nearly proportional to the number of GPUs. We therefore conclude that GPU computing for large-scale simulation of seismic wave propagation is a promising approach as a faster simulation is possible with reduced computational resources compared to CPUs.
GPU-accelerated Kernel Regression Reconstruction for Freehand 3D Ultrasound Imaging.
Wen, Tiexiang; Li, Ling; Zhu, Qingsong; Qin, Wenjian; Gu, Jia; Yang, Feng; Xie, Yaoqin
2017-07-01
Volume reconstruction method plays an important role in improving reconstructed volumetric image quality for freehand three-dimensional (3D) ultrasound imaging. By utilizing the capability of programmable graphics processing unit (GPU), we can achieve a real-time incremental volume reconstruction at a speed of 25-50 frames per second (fps). After incremental reconstruction and visualization, hole-filling is performed on GPU to fill remaining empty voxels. However, traditional pixel nearest neighbor-based hole-filling fails to reconstruct volume with high image quality. On the contrary, the kernel regression provides an accurate volume reconstruction method for 3D ultrasound imaging but with the cost of heavy computational complexity. In this paper, a GPU-based fast kernel regression method is proposed for high-quality volume after the incremental reconstruction of freehand ultrasound. The experimental results show that improved image quality for speckle reduction and details preservation can be obtained with the parameter setting of kernel window size of [Formula: see text] and kernel bandwidth of 1.0. The computational performance of the proposed GPU-based method can be over 200 times faster than that on central processing unit (CPU), and the volume with size of 50 million voxels in our experiment can be reconstructed within 10 seconds.
NASA Technical Reports Server (NTRS)
Low, M. D.; Baker, M.; Ferguson, R.; Frost, J. D., Jr.
1972-01-01
This paper describes a complete electroencephalographic acquisition and transmission system, designed to meet the needs of a large hospital with multiple critical care patient monitoring units. The system provides rapid and prolonged access to a centralized recording and computing area from remote locations within the hospital complex, and from locations in other hospitals and other cities. The system includes quick-on electrode caps, amplifier units and cable transmission for access from within the hospital, and EEG digitization and telephone transmission for access from other hospitals or cities.
Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.
2011-01-01
Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
NASA Astrophysics Data System (ADS)
Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.
2011-12-01
With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
ERIC Educational Resources Information Center
Computing Teacher, 1985
1985-01-01
Defines computer literacy and describes a computer literacy course which stresses ethics, hardware, and disk operating systems throughout. Core units on keyboarding, word processing, graphics, database management, problem solving, algorithmic thinking, and programing are outlined, together with additional units on spreadsheets, simulations,…
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Theory and Techniques for Assessing the Demand and Supply of Outdoor Recreation in the United States
H. Ken Cordell; John C. Bergstrom
1989-01-01
As the central analysis for the 1989 Renewable Resources planning Act Assessment, a household market model covering 37 recreational activities was computed for the United States. Equilibrium consumption and costs were estimated, as were likely future changes in consumption and costs in response to expected demand growth and alternative development and access policies...
Data acquisition and analysis in the DOE/NASA Wind Energy Program
NASA Technical Reports Server (NTRS)
Neustadter, H. E.
1980-01-01
Four categories of data systems, each responding to a distinct information need are presented. The categories are: control, technology, engineering and performance. The focus is on the technology data system which consists of the following elements: sensors which measure critical parameters such as wind speed and direction, output power, blade loads and strains, and tower vibrations; remote multiplexing units (RMU) mounted on each wind turbine which frequency modulate, multiplex and transmit sensor outputs; the instrumentation available to record, process and display these signals; and centralized computer analysis of data. The RMU characteristics and multiplexing techniques are presented. Data processing is illustrated by following a typical signal through instruments such as the analog tape recorder, analog to digital converter, data compressor, digital tape recorder, video (CRT) display, and strip chart recorder.
Prell, Daniel; Kyriakou, Yiannis; Beister, Marcel; Kalender, Willi A
2009-11-07
Metallic implants generate streak-like artifacts in flat-detector computed tomography (FD-CT) reconstructed volumetric images. This study presents a novel method for reducing these disturbing artifacts by inserting discarded information into the original rawdata using a three-step correction procedure and working directly with each detector element. Computation times are minimized by completely implementing the correction process on graphics processing units (GPUs). First, the original volume is corrected using a three-dimensional interpolation scheme in the rawdata domain, followed by a second reconstruction. This metal artifact-reduced volume is then segmented into three materials, i.e. air, soft-tissue and bone, using a threshold-based algorithm. Subsequently, a forward projection of the obtained tissue-class model substitutes the missing or corrupted attenuation values directly for each flat detector element that contains attenuation values corresponding to metal parts, followed by a final reconstruction. Experiments using tissue-equivalent phantoms showed a significant reduction of metal artifacts (deviations of CT values after correction compared to measurements without metallic inserts reduced typically to below 20 HU, differences in image noise to below 5 HU) caused by the implants and no significant resolution losses even in areas close to the inserts. To cover a variety of different cases, cadaver measurements and clinical images in the knee, head and spine region were used to investigate the effectiveness and applicability of our method. A comparison to a three-dimensional interpolation correction showed that the new approach outperformed interpolation schemes. Correction times are minimized, and initial and corrected images are made available at almost the same time (12.7 s for the initial reconstruction, 46.2 s for the final corrected image compared to 114.1 s and 355.1 s on central processing units (CPUs)).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com; Suprijadi; Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132
Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic imagesmore » and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.« less
Accelerating epistasis analysis in human genetics with consumer graphics hardware.
Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H
2009-07-24
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
A parallel computational model for GATE simulations.
Rannou, F R; Vega-Acevedo, N; El Bitar, Z
2013-12-01
GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Optimized Laplacian image sharpening algorithm based on graphic processing unit
NASA Astrophysics Data System (ADS)
Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah
2014-12-01
In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
Cortical Neural Computation by Discrete Results Hypothesis
Castejon, Carlos; Nuñez, Angel
2016-01-01
One of the most challenging problems we face in neuroscience is to understand how the cortex performs computations. There is increasing evidence that the power of the cortical processing is produced by populations of neurons forming dynamic neuronal ensembles. Theoretical proposals and multineuronal experimental studies have revealed that ensembles of neurons can form emergent functional units. However, how these ensembles are implicated in cortical computations is still a mystery. Although cell ensembles have been associated with brain rhythms, the functional interaction remains largely unclear. It is still unknown how spatially distributed neuronal activity can be temporally integrated to contribute to cortical computations. A theoretical explanation integrating spatial and temporal aspects of cortical processing is still lacking. In this Hypothesis and Theory article, we propose a new functional theoretical framework to explain the computational roles of these ensembles in cortical processing. We suggest that complex neural computations underlying cortical processing could be temporally discrete and that sensory information would need to be quantized to be computed by the cerebral cortex. Accordingly, we propose that cortical processing is produced by the computation of discrete spatio-temporal functional units that we have called “Discrete Results” (Discrete Results Hypothesis). This hypothesis represents a novel functional mechanism by which information processing is computed in the cortex. Furthermore, we propose that precise dynamic sequences of “Discrete Results” is the mechanism used by the cortex to extract, code, memorize and transmit neural information. The novel “Discrete Results” concept has the ability to match the spatial and temporal aspects of cortical processing. We discuss the possible neural underpinnings of these functional computational units and describe the empirical evidence supporting our hypothesis. We propose that fast-spiking (FS) interneuron may be a key element in our hypothesis providing the basis for this computation. PMID:27807408
Cortical Neural Computation by Discrete Results Hypothesis.
Castejon, Carlos; Nuñez, Angel
2016-01-01
One of the most challenging problems we face in neuroscience is to understand how the cortex performs computations. There is increasing evidence that the power of the cortical processing is produced by populations of neurons forming dynamic neuronal ensembles. Theoretical proposals and multineuronal experimental studies have revealed that ensembles of neurons can form emergent functional units. However, how these ensembles are implicated in cortical computations is still a mystery. Although cell ensembles have been associated with brain rhythms, the functional interaction remains largely unclear. It is still unknown how spatially distributed neuronal activity can be temporally integrated to contribute to cortical computations. A theoretical explanation integrating spatial and temporal aspects of cortical processing is still lacking. In this Hypothesis and Theory article, we propose a new functional theoretical framework to explain the computational roles of these ensembles in cortical processing. We suggest that complex neural computations underlying cortical processing could be temporally discrete and that sensory information would need to be quantized to be computed by the cerebral cortex. Accordingly, we propose that cortical processing is produced by the computation of discrete spatio-temporal functional units that we have called "Discrete Results" (Discrete Results Hypothesis). This hypothesis represents a novel functional mechanism by which information processing is computed in the cortex. Furthermore, we propose that precise dynamic sequences of "Discrete Results" is the mechanism used by the cortex to extract, code, memorize and transmit neural information. The novel "Discrete Results" concept has the ability to match the spatial and temporal aspects of cortical processing. We discuss the possible neural underpinnings of these functional computational units and describe the empirical evidence supporting our hypothesis. We propose that fast-spiking (FS) interneuron may be a key element in our hypothesis providing the basis for this computation.
NASA Technical Reports Server (NTRS)
Pan, Jing; Levitt, Karl N.; Cohen, Gerald C.
1991-01-01
Discussed here is work to formally specify and verify a floating point coprocessor based on the MC68881. The HOL verification system developed at Cambridge University was used. The coprocessor consists of two independent units: the bus interface unit used to communicate with the cpu and the arithmetic processing unit used to perform the actual calculation. Reasoning about the interaction and synchronization among processes using higher order logic is demonstrated.
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.
Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A
2017-06-15
Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Kiefer, Gundolf; Lehmann, Helko; Weese, Jürgen
2006-04-01
Maximum intensity projections (MIPs) are an important visualization technique for angiographic data sets. Efficient data inspection requires frame rates of at least five frames per second at preserved image quality. Despite the advances in computer technology, this task remains a challenge. On the one hand, the sizes of computed tomography and magnetic resonance images are increasing rapidly. On the other hand, rendering algorithms do not automatically benefit from the advances in processor technology, especially for large data sets. This is due to the faster evolving processing power and the slower evolving memory access speed, which is bridged by hierarchical cache memory architectures. In this paper, we investigate memory access optimization methods and use them for generating MIPs on general-purpose central processing units (CPUs) and graphics processing units (GPUs), respectively. These methods can work on any level of the memory hierarchy, and we show that properly combined methods can optimize memory access on multiple levels of the hierarchy at the same time. We present performance measurements to compare different algorithm variants and illustrate the influence of the respective techniques. On current hardware, the efficient handling of the memory hierarchy for CPUs improves the rendering performance by a factor of 3 to 4. On GPUs, we observed that the effect is even larger, especially for large data sets. The methods can easily be adjusted to different hardware specifics, although their impact can vary considerably. They can also be used for other rendering techniques than MIPs, and their use for more general image processing task could be investigated in the future.
On localization attacks against cloud infrastructure
NASA Astrophysics Data System (ADS)
Ge, Linqiang; Yu, Wei; Sistani, Mohammad Ali
2013-05-01
One of the key characteristics of cloud computing is the device and location independence that enables the user to access systems regardless of their location. Because cloud computing is heavily based on sharing resource, it is vulnerable to cyber attacks. In this paper, we investigate a localization attack that enables the adversary to leverage central processing unit (CPU) resources to localize the physical location of server used by victims. By increasing and reducing CPU usage through the malicious virtual machine (VM), the response time from the victim VM will increase and decrease correspondingly. In this way, by embedding the probing signal into the CPU usage and correlating the same pattern in the response time from the victim VM, the adversary can find the location of victim VM. To determine attack accuracy, we investigate features in both the time and frequency domains. We conduct both theoretical and experimental study to demonstrate the effectiveness of such an attack.
Spacecraft flight control with the new phase space control law and optimal linear jet select
NASA Technical Reports Server (NTRS)
Bergmann, E. V.; Croopnick, S. R.; Turkovich, J. J.; Work, C. C.
1977-01-01
An autopilot designed for rotation and translation control of a rigid spacecraft is described. The autopilot uses reaction control jets as control effectors and incorporates a six-dimensional phase space control law as well as a linear programming algorithm for jet selection. The interaction of the control law and jet selection was investigated and a recommended configuration proposed. By means of a simulation procedure the new autopilot was compared with an existing system and was found to be superior in terms of core memory, central processing unit time, firings, and propellant consumption. But it is thought that the cycle time required to perform the jet selection computations might render the new autopilot unsuitable for existing flight computer applications, without modifications. The new autopilot is capable of maintaining attitude control in the presence of a large number of jet failures.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del Ben, Mauro, E-mail: mauro.delben@chem.uzh.ch; Hutter, Jürg, E-mail: hutter@chem.uzh.ch; VandeVondele, Joost, E-mail: Joost.VandeVondele@mat.ethz.ch
The forces acting on the atoms as well as the stress tensor are crucial ingredients for calculating the structural and dynamical properties of systems in the condensed phase. Here, these derivatives of the total energy are evaluated for the second-order Møller-Plesset perturbation energy (MP2) in the framework of the resolution of identity Gaussian and plane waves method, in a way that is fully consistent with how the total energy is computed. This consistency is non-trivial, given the different ways employed to compute Coulomb, exchange, and canonical four center integrals, and allows, for example, for energy conserving dynamics in various ensembles.more » Based on this formalism, a massively parallel algorithm has been developed for finite and extended system. The designed parallel algorithm displays, with respect to the system size, cubic, quartic, and quintic requirements, respectively, for the memory, communication, and computation. All these requirements are reduced with an increasing number of processes, and the measured performance shows excellent parallel scalability and efficiency up to thousands of nodes. Additionally, the computationally more demanding quintic scaling steps can be accelerated by employing graphics processing units (GPU’s) showing, for large systems, a gain of almost a factor two compared to the standard central processing unit-only case. In this way, the evaluation of the derivatives of the RI-MP2 energy can be performed within a few minutes for systems containing hundreds of atoms and thousands of basis functions. With good time to solution, the implementation thus opens the possibility to perform molecular dynamics (MD) simulations in various ensembles (microcanonical ensemble and isobaric-isothermal ensemble) at the MP2 level of theory. Geometry optimization, full cell relaxation, and energy conserving MD simulations have been performed for a variety of molecular crystals including NH{sub 3}, CO{sub 2}, formic acid, and benzene.« less
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Storaasli, Olaf O.
2004-01-01
Of several iterative and direct equation solvers evaluated previously for computations in aeroacoustics, the most promising was the NASA-developed General-Purpose Solver (winner of NASA's 1999 software of the year award). This paper presents detailed, single-processor statistics of the performance of this solver, which has been tailored and optimized for large-scale aeroacoustic computations. The statistics, compiled using an SGI ORIGIN 2000 computer with 12 Gb available memory (RAM) and eight available processors, are the central processing unit time, RAM requirements, and solution error. The equation solver is capable of solving 10 thousand complex unknowns in as little as 0.01 sec using 0.02 Gb RAM, and 8.4 million complex unknowns in slightly less than 3 hours using all 12 Gb. This latter solution is the largest aeroacoustics problem solved to date with this technique. The study was unable to detect any noticeable error in the solution, since noise levels predicted from these solution vectors are in excellent agreement with the noise levels computed from the exact solution. The equation solver provides a means for obtaining numerical solutions to aeroacoustics problems in three dimensions.
Replication of Space-Shuttle Computers in FPGAs and ASICs
NASA Technical Reports Server (NTRS)
Ferguson, Roscoe C.
2008-01-01
A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Graphics processing unit based computation for NDE applications
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.
2012-05-01
Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Treml, Benjamin; Gillman, Andrew; Buskohl, Philip; Vaia, Richard
2018-06-18
Robots autonomously interact with their environment through a continual sense-decide-respond control loop. Most commonly, the decide step occurs in a central processing unit; however, the stiffness mismatch between rigid electronics and the compliant bodies of soft robots can impede integration of these systems. We develop a framework for programmable mechanical computation embedded into the structure of soft robots that can augment conventional digital electronic control schemes. Using an origami waterbomb as an experimental platform, we demonstrate a 1-bit mechanical storage device that writes, erases, and rewrites itself in response to a time-varying environmental signal. Further, we show that mechanical coupling between connected origami units can be used to program the behavior of a mechanical bit, produce logic gates such as AND, OR, and three input majority gates, and transmit signals between mechanologic gates. Embedded mechanologic provides a route to add autonomy and intelligence in soft robots and machines. Copyright © 2018 the Author(s). Published by PNAS.
Spatially Characterizing Effective Timber Supply
NASA Technical Reports Server (NTRS)
Berry, J. K.; Sailor, J.
1982-01-01
The structure of a computer-oriented cartographic model for assessing roundwood supply for generation of base load electricity is discussed. The model provides an analytical procedure for coupling spatial information of harvesting economics and owner willingness to sell stumpages. Supply is characterized in terms of standing timber; of accessibility considering various harvesting and hauling factors; and of availability as affected by ownership and residential patterns. Factors governing accessibility to timber include effective harvesting distance to haulic roads as modified by barriers and slopes. Haul distance is expressed in units that take into account the relative ease of travel along various road types to a central processing facility. Areas of accessible timber are grouped into spatial units, termed 'timbersheds', of common access to particular haul road segments that belong to unique 'transport zones'. Timber availability considerations include size of ownership parcels, housing density and excluded areas. The analysis techniques are demonstrated for a cartographic data base in western Massachusetts.
Yeari, Menahem; van den Broek, Paul
2016-09-01
It is a well-accepted view that the prior semantic (general) knowledge that readers possess plays a central role in reading comprehension. Nevertheless, computational models of reading comprehension have not integrated the simulation of semantic knowledge and online comprehension processes under a unified mathematical algorithm. The present article introduces a computational model that integrates the landscape model of comprehension processes with latent semantic analysis representation of semantic knowledge. In three sets of simulations of previous behavioral findings, the integrated model successfully simulated the activation and attenuation of predictive and bridging inferences during reading, as well as centrality estimations and recall of textual information after reading. Analyses of the computational results revealed new theoretical insights regarding the underlying mechanisms of the various comprehension phenomena.
Central Data Processing System (CDPS) user's manual: Solar heating and cooling program
NASA Technical Reports Server (NTRS)
1976-01-01
The software and data base management system required to assess the performance of solar heating and cooling systems installed at multiple sites is presented. The instrumentation data associated with these systems is collected, processed, and presented in a form which supported continuity of performance evaluation across all applications. The CDPS consisted of three major elements: communication interface computer, central data processing computer, and performance evaluation data base. Users of the performance data base were identified, and procedures for operation, and guidelines for software maintenance were outlined. The manual also defined the output capabilities of the CDPS in support of external users of the system.
Chikkagoudar, Satish; Wang, Kai; Li, Mingyao
2011-05-26
Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
NASA Astrophysics Data System (ADS)
Xu, Shigang; Liu, Yang
2018-03-01
The conventional pseudo-acoustic wave equations (PWEs) in arbitrary orthorhombic anisotropic (OA) media usually have coupled P- and SV-wave modes. These coupled equations may introduce strong SV-wave artifacts and numerical instabilities in P-wave simulation results and reverse-time migration (RTM) profiles. However, pure acoustic wave equations (PAWEs) completely decouple the P-wave component from the full elastic wavefield and naturally solve all the aforementioned problems. In this article, we present a novel PAWE in arbitrary OA media and compare it with the conventional coupled PWEs. Through decomposing the solution of the corresponding eigenvalue equation for the original PWE into an ellipsoidal differential operator (EDO) and an ellipsoidal scalar operator (ESO), the new PAWE in time-space domain is constructed by applying the combination of these two solvable operators and can effectively describe P-wave features in arbitrary OA media. Furthermore, we adopt the optimal finite-difference method (FDM) to solve the newly derived PAWE. In addition, the three-dimensional (3D) hybrid absorbing boundary condition (HABC) with some reasonable modifications is developed for reducing artificial edge reflections in anisotropic media. To improve computational efficiency in 3D case, we adopt graphic processing unit (GPU) with Compute Unified Device Architecture (CUDA) instead of traditional central processing unit (CPU) architecture. Several numerical experiments for arbitrary OA models confirm that the proposed schemes can produce pure, stable and accurate P-wave modeling results and RTM images with higher computational efficiency. Moreover, the 3D numerical simulations can provide us with a comprehensive and real description of wave propagation.
2011-01-01
Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj
2006-01-01
A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
Device and method to enhance availability of cluster-based processing systems
NASA Technical Reports Server (NTRS)
Lupia, David J. (Inventor); Ramos, Jeremy (Inventor); Samson, Jr., John R. (Inventor)
2010-01-01
An electronic computing device including at least one processing unit that implements a specific fault signal upon experiencing an associated fault, a control unit that generates a specific recovery signal upon receiving the fault signal from the at least one processing unit, and at least one input memory unit. The recovery signal initiates specific recovery processes in the at least one processing unit. The input memory buffers input data signals input to the at least one processing unit that experienced the fault during the recovery period.
Modeling and Simulation of the Economics of Mining in the Bitcoin Market
Marchesi, Michele
2016-01-01
In January 3, 2009, Satoshi Nakamoto gave rise to the “Bitcoin Blockchain”, creating the first block of the chain hashing on his computer’s central processing unit (CPU). Since then, the hash calculations to mine Bitcoin have been getting more and more complex, and consequently the mining hardware evolved to adapt to this increasing difficulty. Three generations of mining hardware have followed the CPU’s generation. They are GPU’s, FPGA’s and ASIC’s generations. This work presents an agent-based artificial market model of the Bitcoin mining process and of the Bitcoin transactions. The goal of this work is to model the economy of the mining process, starting from GPU’s generation, the first with economic significance. The model reproduces some “stylized facts” found in real-time price series and some core aspects of the mining business. In particular, the computational experiments performed can reproduce the unit root property, the fat tail phenomenon and the volatility clustering of Bitcoin price series. In addition, under proper assumptions, they can reproduce the generation of Bitcoins, the hashing capability, the power consumption, and the mining hardware and electrical energy expenditures of the Bitcoin network. PMID:27768691
NASA Technical Reports Server (NTRS)
Tran, Donald H.; Snyder, Christopher A.
1992-01-01
A study was performed to quantify the differences in turbine engine performance with and without the chemical dissociation effects for various fuel types over a range of combustor temperatures. Both turbojet and turbofan engines were studied with hydrocarbon fuels and cryogenic, nonhydrocarbon fuels. Results of the study indicate that accuracy of engine performance decreases when nonhydrocarbon fuels are used, especially at high temperatures where chemical dissociation becomes more significant. For instance, the deviation in net thrust for liquid hydrogen fuel can become as high as 20 percent at 4160 R. This study reveals that computer central processing unit (CPU) time increases significantly when dissociation effects are included in the cycle analysis.
2007-08-02
Societal stigmas against gangs and gang- deportees from the United States have made the process of leaving a gang extremely difficult. A recent...often unwilling to hire them. Tattooed former gang members, especially returning deportees from the United States who are often native English...recipients of deportees on a per capita basis. For all Central American countries, with the exception of Panama, those deported on criminal grounds
STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB.
Klingbeil, Guido; Erban, Radek; Giles, Mike; Maini, Philip K
2011-04-15
The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new software tool STOCHSIMGPU that exploits graphics processing units (GPUs) for parallel stochastic simulations of biological/chemical reaction systems and show that significant gains in efficiency can be made. It is integrated into MATLAB and works with the Systems Biology Toolbox 2 (SBTOOLBOX2) for MATLAB. The GPU-based parallel implementation of the Gillespie stochastic simulation algorithm (SSA), the logarithmic direct method (LDM) and the next reaction method (NRM) is approximately 85 times faster than the sequential implementation of the NRM on a central processing unit (CPU). Using our software does not require any changes to the user's models, since it acts as a direct replacement of the stochastic simulation software of the SBTOOLBOX2. The software is open source under the GPL v3 and available at http://www.maths.ox.ac.uk/cmb/STOCHSIMGPU. The web site also contains supplementary information. klingbeil@maths.ox.ac.uk Supplementary data are available at Bioinformatics online.
Embracing the quantum limit in silicon computing.
Morton, John J L; McCamey, Dane R; Eriksson, Mark A; Lyon, Stephen A
2011-11-16
Quantum computers hold the promise of massive performance enhancements across a range of applications, from cryptography and databases to revolutionary scientific simulation tools. Such computers would make use of the same quantum mechanical phenomena that pose limitations on the continued shrinking of conventional information processing devices. Many of the key requirements for quantum computing differ markedly from those of conventional computers. However, silicon, which plays a central part in conventional information processing, has many properties that make it a superb platform around which to build a quantum computer. © 2011 Macmillan Publishers Limited. All rights reserved
Multilevel processes and cultural adaptation: Examples from past and present small-scale societies.
Reyes-García, V; Balbo, A L; Gomez-Baggethun, E; Gueze, M; Mesoudi, A; Richerson, P; Rubio-Campillo, X; Ruiz-Mallén, I; Shennan, S
2016-12-01
Cultural adaptation has become central in the context of accelerated global change with authors increasingly acknowledging the importance of understanding multilevel processes that operate as adaptation takes place. We explore the importance of multilevel processes in explaining cultural adaptation by describing how processes leading to cultural (mis)adaptation are linked through a complex nested hierarchy, where the lower levels combine into new units with new organizations, functions, and emergent properties or collective behaviours. After a brief review of the concept of "cultural adaptation" from the perspective of cultural evolutionary theory and resilience theory, the core of the paper is constructed around the exploration of multilevel processes occurring at the temporal, spatial, social and political scales. We do so by examining small-scale societies' case studies. In each section, we discuss the importance of the selected scale for understanding cultural adaptation and then present an example that illustrates how multilevel processes in the selected scale help explain observed patterns in the cultural adaptive process. We end the paper discussing the potential of modelling and computer simulation for studying multilevel processes in cultural adaptation.
Multilevel processes and cultural adaptation: Examples from past and present small-scale societies
Reyes-García, V.; Balbo, A. L.; Gomez-Baggethun, E.; Gueze, M.; Mesoudi, A.; Richerson, P.; Rubio-Campillo, X.; Ruiz-Mallén, I.; Shennan, S.
2016-01-01
Cultural adaptation has become central in the context of accelerated global change with authors increasingly acknowledging the importance of understanding multilevel processes that operate as adaptation takes place. We explore the importance of multilevel processes in explaining cultural adaptation by describing how processes leading to cultural (mis)adaptation are linked through a complex nested hierarchy, where the lower levels combine into new units with new organizations, functions, and emergent properties or collective behaviours. After a brief review of the concept of “cultural adaptation” from the perspective of cultural evolutionary theory and resilience theory, the core of the paper is constructed around the exploration of multilevel processes occurring at the temporal, spatial, social and political scales. We do so by examining small-scale societies’ case studies. In each section, we discuss the importance of the selected scale for understanding cultural adaptation and then present an example that illustrates how multilevel processes in the selected scale help explain observed patterns in the cultural adaptive process. We end the paper discussing the potential of modelling and computer simulation for studying multilevel processes in cultural adaptation. PMID:27774109
I/O-Efficient Scientific Computation Using TPIE
NASA Technical Reports Server (NTRS)
Vengroff, Darren Erik; Vitter, Jeffrey Scott
1996-01-01
In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
Parallel computer processing and modeling: applications for the ICU
NASA Astrophysics Data System (ADS)
Baxter, Grant; Pranger, L. Alex; Draghic, Nicole; Sims, Nathaniel M.; Wiesmann, William P.
2003-07-01
Current patient monitoring procedures in hospital intensive care units (ICUs) generate vast quantities of medical data, much of which is considered extemporaneous and not evaluated. Although sophisticated monitors to analyze individual types of patient data are routinely used in the hospital setting, this equipment lacks high order signal analysis tools for detecting long-term trends and correlations between different signals within a patient data set. Without the ability to continuously analyze disjoint sets of patient data, it is difficult to detect slow-forming complications. As a result, the early onset of conditions such as pneumonia or sepsis may not be apparent until the advanced stages. We report here on the development of a distributed software architecture test bed and software medical models to analyze both asynchronous and continuous patient data in real time. Hardware and software has been developed to support a multi-node distributed computer cluster capable of amassing data from multiple patient monitors and projecting near and long-term outcomes based upon the application of physiologic models to the incoming patient data stream. One computer acts as a central coordinating node; additional computers accommodate processing needs. A simple, non-clinical model for sepsis detection was implemented on the system for demonstration purposes. This work shows exceptional promise as a highly effective means to rapidly predict and thereby mitigate the effect of nosocomial infections.
Parallel-Processing Software for Creating Mosaic Images
NASA Technical Reports Server (NTRS)
Klimeck, Gerhard; Deen, Robert; McCauley, Michael; DeJong, Eric
2008-01-01
A computer program implements parallel processing for nearly real-time creation of panoramic mosaics of images of terrain acquired by video cameras on an exploratory robotic vehicle (e.g., a Mars rover). Because the original images are typically acquired at various camera positions and orientations, it is necessary to warp the images into the reference frame of the mosaic before stitching them together to create the mosaic. [Also see "Parallel-Processing Software for Correlating Stereo Images," Software Supplement to NASA Tech Briefs, Vol. 31, No. 9 (September 2007) page 26.] The warping algorithm in this computer program reflects the considerations that (1) for every pixel in the desired final mosaic, a good corresponding point must be found in one or more of the original images and (2) for this purpose, one needs a good mathematical model of the cameras and a good correlation of individual pixels with respect to their positions in three dimensions. The desired mosaic is divided into slices, each of which is assigned to one of a number of central processing units (CPUs) operating simultaneously. The results from the CPUs are gathered and placed into the final mosaic. The time taken to create the mosaic depends upon the number of CPUs, the speed of each CPU, and whether a local or a remote data-staging mechanism is used.
Making War Work for Industry: The United Alkali Company's Central Laboratory During World War One.
Reed, Peter
2015-02-01
The creation of the Central Laboratory immediately after the United Alkali Company (UAC) was formed in 1890, by amalgamating the Leblanc alkali works in Britain, brought high expectations of repositioning the company by replacing its obsolete Leblanc process plant and expanding its range of chemical products. By 1914, UAC had struggled with few exceptions to adopt new technologies and processes and was still reliant on the Leblanc process. From 1914, the Government would rely heavily on its contribution to the war effort. As a major heavy-chemical manufacturer, UAC produced chemicals for explosives and warfare gases, while also trying to maintain production of many essential chemicals including fertilisers for homeland consumption. UAC's wartime effort was led by the Central Laboratory, working closely with the recently established Engineer's Department to develop new process pathways, build new plant, adapt existing plant, and produce the contracted quantities, all as quickly as possible to meet the changing battlefield demands. This article explores how wartime conditions and demands provided the stimulus for the Central Laboratory's crucial R&D work during World War One.
Hand-held computer operating system program for collection of resident experience data.
Malan, T K; Haffner, W H; Armstrong, A Y; Satin, A J
2000-11-01
To describe a system for recording resident experience involving hand-held computers with the Palm Operating System (3 Com, Inc., Santa Clara, CA). Hand-held personal computers (PCs) are popular, easy to use, inexpensive, portable, and can share data among other operating systems. Residents in our program carry individual hand-held database computers to record Residency Review Committee (RRC) reportable patient encounters. Each resident's data is transferred to a single central relational database compatible with Microsoft Access (Microsoft Corporation, Redmond, WA). Patient data entry and subsequent transfer to a central database is accomplished with commercially available software that requires minimal computer expertise to implement and maintain. The central database can then be used for statistical analysis or to create required RRC resident experience reports. As a result, the data collection and transfer process takes less time for residents and program director alike, than paper-based or central computer-based systems. The system of collecting resident encounter data using hand-held computers with the Palm Operating System is easy to use, relatively inexpensive, accurate, and secure. The user-friendly system provides prompt, complete, and accurate data, enhancing the education of residents while facilitating the job of the program director.
Mobile GPU-based implementation of automatic analysis method for long-term ECG.
Fan, Xiaomao; Yao, Qihang; Li, Ye; Chen, Runge; Cai, Yunpeng
2018-05-03
Long-term electrocardiogram (ECG) is one of the important diagnostic assistant approaches in capturing intermittent cardiac arrhythmias. Combination of miniaturized wearable holters and healthcare platforms enable people to have their cardiac condition monitored at home. The high computational burden created by concurrent processing of numerous holter data poses a serious challenge to the healthcare platform. An alternative solution is to shift the analysis tasks from healthcare platforms to the mobile computing devices. However, long-term ECG data processing is quite time consuming due to the limited computation power of the mobile central unit processor (CPU). This paper aimed to propose a novel parallel automatic ECG analysis algorithm which exploited the mobile graphics processing unit (GPU) to reduce the response time for processing long-term ECG data. By studying the architecture of the sequential automatic ECG analysis algorithm, we parallelized the time-consuming parts and reorganized the entire pipeline in the parallel algorithm to fully utilize the heterogeneous computing resources of CPU and GPU. The experimental results showed that the average executing time of the proposed algorithm on a clinical long-term ECG dataset (duration 23.0 ± 1.0 h per signal) is 1.215 ± 0.140 s, which achieved an average speedup of 5.81 ± 0.39× without compromising analysis accuracy, comparing with the sequential algorithm. Meanwhile, the battery energy consumption of the automatic ECG analysis algorithm was reduced by 64.16%. Excluding energy consumption from data loading, 79.44% of the energy consumption could be saved, which alleviated the problem of limited battery working hours for mobile devices. The reduction of response time and battery energy consumption in ECG analysis not only bring better quality of experience to holter users, but also make it possible to use mobile devices as ECG terminals for healthcare professions such as physicians and health advisers, enabling them to inspect patient ECG recordings onsite efficiently without the need of a high-quality wide-area network environment.
LASSIE: simulating large-scale models of biochemical systems on GPUs.
Tangherloni, Andrea; Nobile, Marco S; Besozzi, Daniela; Mauri, Giancarlo; Cazzaniga, Paolo
2017-05-10
Mathematical modeling and in silico analysis are widely acknowledged as complementary tools to biological laboratory methods, to achieve a thorough understanding of emergent behaviors of cellular processes in both physiological and perturbed conditions. Though, the simulation of large-scale models-consisting in hundreds or thousands of reactions and molecular species-can rapidly overtake the capabilities of Central Processing Units (CPUs). The purpose of this work is to exploit alternative high-performance computing solutions, such as Graphics Processing Units (GPUs), to allow the investigation of these models at reduced computational costs. LASSIE is a "black-box" GPU-accelerated deterministic simulator, specifically designed for large-scale models and not requiring any expertise in mathematical modeling, simulation algorithms or GPU programming. Given a reaction-based model of a cellular process, LASSIE automatically generates the corresponding system of Ordinary Differential Equations (ODEs), assuming mass-action kinetics. The numerical solution of the ODEs is obtained by automatically switching between the Runge-Kutta-Fehlberg method in the absence of stiffness, and the Backward Differentiation Formulae of first order in presence of stiffness. The computational performance of LASSIE are assessed using a set of randomly generated synthetic reaction-based models of increasing size, ranging from 64 to 8192 reactions and species, and compared to a CPU-implementation of the LSODA numerical integration algorithm. LASSIE adopts a novel fine-grained parallelization strategy to distribute on the GPU cores all the calculations required to solve the system of ODEs. By virtue of this implementation, LASSIE achieves up to 92× speed-up with respect to LSODA, therefore reducing the running time from approximately 1 month down to 8 h to simulate models consisting in, for instance, four thousands of reactions and species. Notably, thanks to its smaller memory footprint, LASSIE is able to perform fast simulations of even larger models, whereby the tested CPU-implementation of LSODA failed to reach termination. LASSIE is therefore expected to make an important breakthrough in Systems Biology applications, for the execution of faster and in-depth computational analyses of large-scale models of complex biological systems.
NASA Technical Reports Server (NTRS)
Boyle, W. G.; Barton, G. W.
1979-01-01
The feasibility of computerized automation of the Analytical Laboratories Section at NASA's Lewis Research Center was considered. Since that laboratory's duties are not routine, the automation goals were set with that in mind. Four instruments were selected as the most likely automation candidates: an atomic absorption spectrophotometer, an emission spectrometer, an X-ray fluorescence spectrometer, and an X-ray diffraction unit. Two options for computer automation were described: a time-shared central computer and a system with microcomputers for each instrument connected to a central computer. A third option, presented for future planning, expands the microcomputer version. Costs and benefits for each option were considered. It was concluded that the microcomputer version best fits the goals and duties of the laboratory and that such an automted system is needed to meet the laboratory's future requirements.
Development of a PC-based ground support system for a small satellite instrument
NASA Astrophysics Data System (ADS)
Deschambault, Robert L.; Gregory, Philip R.; Spenler, Stephen; Whalen, Brian A.
1993-11-01
The importance of effective ground support for the remote control and data retrieval of a satellite instrument cannot be understated. Problems with ground support may include the need to base personnel at a ground tracking station for extended periods, and the delay between the instrument observation and the processing of the data by the science team. Flexible solutions to such problems in the case of small satellite systems are provided by using low-cost, powerful personal computers and off-the-shelf software for data acquisition and processing, and by using Internet as a communication pathway to enable scientists to view and manipulate satellite data in real time at any ground location. The personal computer based ground support system is illustrated for the case of the cold plasma analyzer flown on the Freja satellite. Commercial software was used as building blocks for writing the ground support equipment software. Several levels of hardware support, including unit tests and development, functional tests, and integration were provided by portable and desktop personal computers. Satellite stations in Saskatchewan and Sweden were linked to the science team via phone lines and Internet, which provided remote control through a central point. These successful strategies will be used on future small satellite space programs.
Photogrammetry for rapid prototyping: development of noncontact 3D reconstruction technologies
NASA Astrophysics Data System (ADS)
Knyaz, Vladimir A.
2002-04-01
An important stage of rapid prototyping technology is generating computer 3D model of an object to be reproduced. Wide variety of techniques for 3D model generation exists beginning with manual 3D models generation and finishing with full-automated reverse engineering system. The progress in CCD sensors and computers provides the background for integration of photogrammetry as an accurate 3D data source with CAD/CAM. The paper presents the results of developing photogrammetric methods for non-contact spatial coordinates measurements and generation of computer 3D model of real objects. The technology is based on object convergent images processing for calculating its 3D coordinates and surface reconstruction. The hardware used for spatial coordinates measurements is based on PC as central processing unit and video camera as image acquisition device. The original software for Windows 9X realizes the complete technology of 3D reconstruction for rapid input of geometry data in CAD/CAM systems. Technical characteristics of developed systems are given along with the results of applying for various tasks of 3D reconstruction. The paper describes the techniques used for non-contact measurements and the methods providing metric characteristics of reconstructed 3D model. Also the results of system application for 3D reconstruction of complex industrial objects are presented.
NASA Technical Reports Server (NTRS)
Bryant, W. H.; Morrell, F. R.
1981-01-01
Attention is given to a redundant strapdown inertial measurement unit for integrated avionics. The system consists of four two-degree-of-freedom turned rotor gyros and four two-degree-of-freedom accelerometers in a skewed and separable semi-octahedral array. The unit is coupled through instrument electronics to two flight computers which compensate sensor errors. The flight computers are interfaced to the microprocessors and process failure detection, isolation, redundancy management and flight control/navigation algorithms. The unit provides dual fail-operational performance and has data processing frequencies consistent with integrated avionics concepts presently planned.
Measurement of fault latency in a digital avionic mini processor, part 2
NASA Technical Reports Server (NTRS)
Mcgough, J.; Swern, F.
1983-01-01
The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are described. Several earlier programs were reprogrammed, expanding the instruction set to capitalize on the full power of the BDX-930 computer. As a final demonstration of fault coverage an extensive, 3-axis, high performance flght control computation was added. The stages in the development of a CPU self-test program emphasizing the relationship between fault coverage, speed, and quantity of instructions were demonstrated.
Connecting the virtual world of computers to the real world of medicinal chemistry.
Glen, Robert C
2011-03-01
Drug discovery involves the simultaneous optimization of chemical and biological properties, usually in a single small molecule, which modulates one of nature's most complex systems: the balance between human health and disease. The increased use of computer-aided methods is having a significant impact on all aspects of the drug-discovery and development process and with improved methods and ever faster computers, computer-aided molecular design will be ever more central to the discovery process.
NASA Technical Reports Server (NTRS)
Ko, William L.; Olona, Timothy; Muramoto, Kyle M.
1990-01-01
Different finite element models previously set up for thermal analysis of the space shuttle orbiter structure are discussed and their shortcomings identified. Element density criteria are established for the finite element thermal modelings of space shuttle orbiter-type large, hypersonic aircraft structures. These criteria are based on rigorous studies on solution accuracies using different finite element models having different element densities set up for one cell of the orbiter wing. Also, a method for optimization of the transient thermal analysis computer central processing unit (CPU) time is discussed. Based on the newly established element density criteria, the orbiter wing midspan segment was modeled for the examination of thermal analysis solution accuracies and the extent of computation CPU time requirements. The results showed that the distributions of the structural temperatures and the thermal stresses obtained from this wing segment model were satisfactory and the computation CPU time was at the acceptable level. The studies offered the hope that modeling the large, hypersonic aircraft structures using high-density elements for transient thermal analysis is possible if a CPU optimization technique was used.
A Method for Growing Bio-memristors from Slime Mold.
Miranda, Eduardo Reck; Braund, Edward
2017-11-02
Our research is aimed at gaining a better understanding of the electronic properties of organisms in order to engineer novel bioelectronic systems and computing architectures based on biology. This specific paper focuses on harnessing the unicellular slime mold Physarum polycephalum to develop bio-memristors (or biological memristors) and bio-computing devices. The memristor is a resistor that possesses memory. It is the 4th fundamental passive circuit element (the other three are the resistor, the capacitor, and the inductor), which is paving the way for the design of new kinds of computing systems; e.g., computers that might relinquish the distinction between storage and a central processing unit. When applied with an AC voltage, the current vs. voltage characteristic of a memristor is a pinched hysteresis loop. It has been shown that P. polycephalum produces pinched hysteresis loops under AC voltages and displays adaptive behavior that is comparable with the functioning of a memristor. This paper presents the method that we developed for implementing bio-memristors with P. polycephalum and introduces the development of a receptacle to culture the organism, which facilitates its deployment as an electronic circuit component. Our method has proven to decrease growth time, increase component lifespan, and standardize electrical observations.
NASA Technical Reports Server (NTRS)
Koeksal, Adnan; Trew, Robert J.; Kauffman, J. Frank
1992-01-01
A Moment Method Model for the radiation pattern characterization of single Linearly Tapered Slot Antennas (LTSA) in air or on a dielectric substrate is developed. This characterization consists of: (1) finding the radiated far-fields of the antenna; (2) determining the E-Plane and H-Plane beamwidths and sidelobe levels; and (3) determining the D-Plane beamwidth and cross polarization levels, as antenna parameters length, height, taper angle, substrate thickness, and the relative substrate permittivity vary. The LTSA geometry does not lend itself to analytical solution with the given parameter ranges. Therefore, a computer modeling scheme and a code are necessary to analyze the problem. This necessity imposes some further objectives or requirements on the solution method (modeling) and tool (computer code). These may be listed as follows: (1) a good approximation to the real antenna geometry; and (2) feasible computer storage and time requirements. According to these requirements, the work is concentrated on the development of efficient modeling schemes for these type of problems and on reducing the central processing unit (CPU) time required from the computer code. A Method of Moments (MoM) code is developed for the analysis of LTSA's within the parameter ranges given.
An efficient sparse matrix multiplication scheme for the CYBER 205 computer
NASA Technical Reports Server (NTRS)
Lambiotte, Jules J., Jr.
1988-01-01
This paper describes the development of an efficient algorithm for computing the product of a matrix and vector on a CYBER 205 vector computer. The desire to provide software which allows the user to choose between the often conflicting goals of minimizing central processing unit (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of four types of storage is selected for each diagonal. The candidate storage types employed were chosen to be efficient on the CYBER 205 for diagonals which have nonzero structure which is dense, moderately sparse, very sparse and short, or very sparse and long; however, for many densities, no diagonal type is most efficient with respect to both resource requirements, and a trade-off must be made. For each diagonal, an initialization subroutine estimates the CPU time and storage required for each storage type based on results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the two resources. The adjusted resource requirements are then compared to select the most efficient storage and computational scheme.
41 CFR 105-56.014 - Purpose and scope.
Code of Federal Regulations, 2010 CFR
2010-07-01
... centralized salary offset computer matching, identify Federal employees who owe delinquent non-tax debt to the... satisfy delinquent non-tax debt owed to the United States. GSA will pursue, when appropriate, such debt... (FMS) administrative offset program, to collect delinquent debts owed to the Federal Government. This...
ERIC Educational Resources Information Center
Schure, Alexander
A computer-based system model for the monitoring and management of the instructional process was conceived, developed and refined through the techniques of systems analysis. This report describes the various aspects and components of this project in a series of independent and self-contained units. The first unit provides an overview of the entire…
Fiore, Vincenzo G; Kottler, Benjamin; Gu, Xiaosi; Hirth, Frank
2017-01-01
The central complex in the insect brain is a composite of midline neuropils involved in processing sensory cues and mediating behavioral outputs to orchestrate spatial navigation. Despite recent advances, however, the neural mechanisms underlying sensory integration and motor action selections have remained largely elusive. In particular, it is not yet understood how the central complex exploits sensory inputs to realize motor functions associated with spatial navigation. Here we report an in silico interrogation of central complex-mediated spatial navigation with a special emphasis on the ellipsoid body. Based on known connectivity and function, we developed a computational model to test how the local connectome of the central complex can mediate sensorimotor integration to guide different forms of behavioral outputs. Our simulations show integration of multiple sensory sources can be effectively performed in the ellipsoid body. This processed information is used to trigger continuous sequences of action selections resulting in self-motion, obstacle avoidance and the navigation of simulated environments of varying complexity. The motor responses to perceived sensory stimuli can be stored in the neural structure of the central complex to simulate navigation relying on a collective of guidance cues, akin to sensory-driven innate or habitual behaviors. By comparing behaviors under different conditions of accessible sources of input information, we show the simulated insect computes visual inputs and body posture to estimate its position in space. Finally, we tested whether the local connectome of the central complex might also allow the flexibility required to recall an intentional behavioral sequence, among different courses of actions. Our simulations suggest that the central complex can encode combined representations of motor and spatial information to pursue a goal and thus successfully guide orientation behavior. Together, the observed computational features identify central complex circuitry, and especially the ellipsoid body, as a key neural correlate involved in spatial navigation.
Fiore, Vincenzo G.; Kottler, Benjamin; Gu, Xiaosi; Hirth, Frank
2017-01-01
The central complex in the insect brain is a composite of midline neuropils involved in processing sensory cues and mediating behavioral outputs to orchestrate spatial navigation. Despite recent advances, however, the neural mechanisms underlying sensory integration and motor action selections have remained largely elusive. In particular, it is not yet understood how the central complex exploits sensory inputs to realize motor functions associated with spatial navigation. Here we report an in silico interrogation of central complex-mediated spatial navigation with a special emphasis on the ellipsoid body. Based on known connectivity and function, we developed a computational model to test how the local connectome of the central complex can mediate sensorimotor integration to guide different forms of behavioral outputs. Our simulations show integration of multiple sensory sources can be effectively performed in the ellipsoid body. This processed information is used to trigger continuous sequences of action selections resulting in self-motion, obstacle avoidance and the navigation of simulated environments of varying complexity. The motor responses to perceived sensory stimuli can be stored in the neural structure of the central complex to simulate navigation relying on a collective of guidance cues, akin to sensory-driven innate or habitual behaviors. By comparing behaviors under different conditions of accessible sources of input information, we show the simulated insect computes visual inputs and body posture to estimate its position in space. Finally, we tested whether the local connectome of the central complex might also allow the flexibility required to recall an intentional behavioral sequence, among different courses of actions. Our simulations suggest that the central complex can encode combined representations of motor and spatial information to pursue a goal and thus successfully guide orientation behavior. Together, the observed computational features identify central complex circuitry, and especially the ellipsoid body, as a key neural correlate involved in spatial navigation. PMID:28824390
Mapping proteins in the presence of paralogs using units of coevolution
2013-01-01
Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible. PMID:24564758
Installation of new Generation General Purpose Computer (GPC) compact unit
NASA Technical Reports Server (NTRS)
1991-01-01
In the Kennedy Space Center's (KSC's) Orbiter Processing Facility (OPF) high bay 2, Spacecraft Electronics technician Ed Carter (right), wearing clean suit, prepares for (26864) and installs (26865) the new Generation General Purpose Computer (GPC) compact IBM unit in Atlantis', Orbiter Vehicle (OV) 104's, middeck avionics bay as Orbiter Systems Quality Control technician Doug Snider looks on. Both men work for NASA contractor Lockheed Space Operations Company. All three orbiters are being outfitted with the compact IBM unit, which replaces a two-unit earlier generation computer.
Body area network--a key infrastructure element for patient-centered telemedicine.
Norgall, Thomas; Schmidt, Robert; von der Grün, Thomas
2004-01-01
The Body Area Network (BAN) extends the range of existing wireless network technologies by an ultra-low range, ultra-low power network solution optimised for long-term or continuous healthcare applications. It enables wireless radio communication between several miniaturised, intelligent Body Sensor (or actor) Units (BSU) and a single Body Central Unit (BCU) worn at the human body. A separate wireless transmission link from the BCU to a network access point--using different technology--provides for online access to BAN components via usual network infrastructure. The BAN network protocol maintains dynamic ad-hoc network configuration scenarios and co-existence of multiple networks.BAN is expected to become a basic infrastructure element for electronic health services: By integrating patient-attached sensors and mobile actor units, distributed information and data processing systems, the range of medical workflow can be extended to include applications like wireless multi-parameter patient monitoring and therapy support. Beyond clinical use and professional disease management environments, private personal health assistance scenarios (without financial reimbursement by health agencies / insurance companies) enable a wide range of applications and services in future pervasive computing and networking environments.
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
Design of a modular digital computer system
NASA Technical Reports Server (NTRS)
1980-01-01
A Central Control Element (CCE) module which controls the Automatically Reconfigurable Modular System (ARMS) and allows both redundant processing and multi-computing in the same computer with real time mode switching, is discussed. The same hardware is used for either reliability enhancement, speed enhancement, or for a combination of both.
ERIC Educational Resources Information Center
Quann, Steve; Satin, Diana
This textbook leads high-beginning and intermediate English-as-a-Second-Language (ESL) students through cooperative computer-based activities that combine language learning with training in basic computer skills and word processing. Each unit concentrates on a basic concept of word processing while also focusing on a grammar topic. Skills are…
Wenchi Jin; Hong S. He; Frank R. Thompson; Wen J. Wang; Jacob S. Fraser; Stephen R. Shifley; Brice B. Hanberry; William D. Dijak
2017-01-01
The Central Hardwood Forest (CHF) in the United States is currently a major carbon sink, there are uncertainties in how long the current carbon sink will persist and if the CHF will eventually become a carbon source. We used a multi-model ensemble to investigate aboveground carbon density of the CHF from 2010 to 2300 under current climate. Simulations were done using...
Centralized Authorization Using a Direct Service, Part II
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wachsmann, A
Authorization is the process of deciding if entity X is allowed to have access to resource Y. Determining the identity of X is the job of the authentication process. One task of authorization in computer networks is to define and determine which user has access to which computers in the network. On Linux, the tendency exists to create a local account for each single user who should be allowed to logon to a computer. This is typically the case because a user not only needs login privileges to a computer but also additional resources like a home directory to actuallymore » do some work. Creating a local account on every computer takes care of all this. The problem with this approach is that these local accounts can be inconsistent with each other. The same user name could have a different user ID and/or group ID on different computers. Even more problematic is when two different accounts share the same user ID and group ID on different computers: User joe on computer1 could have user ID 1234 and group ID 56 and user jane on computer2 could have the same user ID 1234 and group ID 56. This is a big security risk in case shared resources like NFS are used. These two different accounts are the same for an NFS server so that these users can wipe out each other's files. The solution to this inconsistency problem is to have only one central, authoritative data source for this kind of information and a means of providing all your computers with access to this central source. This is what a ''Directory Service'' is. The two directory services most widely used for centralizing authorization data are the Network Information Service (NIS, formerly known as Yellow Pages or YP) and Lightweight Directory Access Protocol (LDAP).« less
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing
Fang, Ye; Ding, Yun; Feinstein, Wei P.; Koppelman, David M.; Moreno, Juana; Jarrell, Mark; Ramanujam, J.; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249. PMID:27420300
Eslami, Taban; Saeed, Fahad
2018-04-20
Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing.
Fang, Ye; Ding, Yun; Feinstein, Wei P; Koppelman, David M; Moreno, Juana; Jarrell, Mark; Ramanujam, J; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249.
NASA Astrophysics Data System (ADS)
Limmer, Steffen; Fey, Dietmar
2013-07-01
Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
NASA Technical Reports Server (NTRS)
Chien, Steve; Rabideau, Gregg; Tran, Daniel; Knight, Russell; Chouinard, Caroline; Estlin, Tara; Gaines, Daniel; Clement, Bradley; Barrett, Anthony
2007-01-01
CASPER is designed to perform automated planning of interdependent activities within a system subject to requirements, constraints, and limitations on resources. In contradistinction to the traditional concept of batch planning followed by execution, CASPER implements a concept of continuous planning and replanning in response to unanticipated changes (including failures), integrated with execution. Improvements over other, similar software that have been incorporated into CASPER version 2.0 include an enhanced executable interface to facilitate integration with a wide range of execution software systems and supporting software libraries; features to support execution while reasoning about urgency, importance, and impending deadlines; features that enable accommodation to a wide range of computing environments that include various central processing units and random- access-memory capacities; and improved generic time-server and time-control features.
Space shuttle main engine controller
NASA Technical Reports Server (NTRS)
Mattox, R. M.; White, J. B.
1981-01-01
A technical description of the space shuttle main engine controller, which provides engine checkout prior to launch, engine control and monitoring during launch, and engine safety and monitoring in orbit, is presented. Each of the major controller subassemblies, the central processing unit, the computer interface electronics, the input electronics, the output electronics, and the power supplies are described and discussed in detail along with engine and orbiter interfaces and operational requirements. The controller represents a unique application of digital concepts, techniques, and technology in monitoring, managing, and controlling a high performance rocket engine propulsion system. The operational requirements placed on the controller, the extremely harsh operating environment to which it is exposed, and the reliability demanded, result in the most complex and rugged digital system ever designed, fabricated, and flown.
High Performance Computing Assets for Ocean Acoustics Research
2016-11-18
independently on processing units with access to a typically available amount of memory, say 16 or 32 gigabytes. Our models require each processor to...allow results to be obtained with limited amounts of memory available to individual processing units (with no time frame for successful completion...put into use. One file server computer to store simulation output has also been purchased. The first workstation has 28 CPU cores, dual- thread , (56
[Automated processing of data from the 1985 population and housing census].
Cholakov, S
1987-01-01
The author describes the method of automated data processing used in the 1985 census of Bulgaria. He notes that the computerization of the census involves decentralization and the use of regional computing centers as well as data processing at the Central Statistical Office's National Information Computer Center. Special attention is given to problems concerning the projection and programming of census data. (SUMMARY IN ENG AND RUS)
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA
NASA Astrophysics Data System (ADS)
Ren, Qinlong
Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.
Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-Based Approach.
Hou, Ping; Cai, Jiarui; Qu, Shen; Xu, Ming
2018-05-01
In life cycle assessment (LCA), collecting unit process data from the empirical sources (i.e., meter readings, operation logs/journals) is often costly and time-consuming. We propose a new computational approach to estimate missing unit process data solely relying on limited known data based on a similarity-based link prediction method. The intuition is that similar processes in a unit process network tend to have similar material/energy inputs and waste/emission outputs. We use the ecoinvent 3.1 unit process data sets to test our method in four steps: (1) dividing the data sets into a training set and a test set; (2) randomly removing certain numbers of data in the test set indicated as missing; (3) using similarity-weighted means of various numbers of most similar processes in the training set to estimate the missing data in the test set; and (4) comparing estimated data with the original values to determine the performance of the estimation. The results show that missing data can be accurately estimated when less than 5% data are missing in one process. The estimation performance decreases as the percentage of missing data increases. This study provides a new approach to compile unit process data and demonstrates a promising potential of using computational approaches for LCA data compilation.
Monitoring benthic aIgal communides: A comparison of targeted and coefficient sampling methods
Edwards, Matthew S.; Tinker, M. Tim
2009-01-01
Choosing an appropriate sample unit is a fundamental decision in the design of ecological studies. While numerous methods have been developed to estimate organism abundance, they differ in cost, accuracy and precision.Using both field data and computer simulation modeling, we evaluated the costs and benefits associated with two methods commonly used to sample benthic organisms in temperate kelp forests. One of these methods, the Targeted Sampling method, relies on different sample units, each "targeted" for a specific species or group of species while the other method relies on coefficients that represent ranges of bottom cover obtained from visual esti-mates within standardized sample units. Both the field data and the computer simulations suggest that both methods yield remarkably similar estimates of organism abundance and among-site variability, although the Coefficient method slightly underestimates variability among sample units when abundances are low. In contrast, the two methods differ considerably in the effort needed to sample these communities; the Targeted Sampling requires more time and twice the personnel to complete. We conclude that the Coefficent Sampling method may be better for environmental monitoring programs where changes in mean abundance are of central concern and resources are limiting, but that the Targeted sampling methods may be better for ecological studies where quantitative relationships among species and small-scale variability in abundance are of central concern.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)
NASA Astrophysics Data System (ADS)
Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.
2016-05-01
This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Exploratory analysis of environmental interactions in central California
De Cola, Lee; Falcone, Neil L.
1996-01-01
As part of its global change research program, the United States Geological Survey (USGS) has produced raster data that describe the land cover of the United States using a consistent format. The data consist of elevations, satellite measurements, computed vegetation indices, land cover classes, and ancillary political, topographic and hydrographic information. This open-file report uses some of these data to explore the environment of a (256-km)? region of central California. We present various visualizations of the data, multiscale correlations between topography and vegetation, a path analysis of more complex statistical interactions, and a map that portrays the influence of agriculture on the region's vegetation. An appendix contains C and Mathematica code used to generate the graphics and some of the analysis.
Wheeler, Russell L.
2014-01-01
Computation of probabilistic earthquake hazard requires an estimate of Mmax, the maximum earthquake magnitude thought to be possible within a specified geographic region. This report is Part A of an Open-File Report that describes the construction of a global catalog of moderate to large earthquakes, from which one can estimate Mmax for most of the Central and Eastern United States and adjacent Canada. The catalog and Mmax estimates derived from it were used in the 2014 edition of the U.S. Geological Survey national seismic-hazard maps. This Part A discusses prehistoric earthquakes that occurred in eastern North America, northwestern Europe, and Australia, whereas a separate Part B deals with historical events.
ERIC Educational Resources Information Center
FALL, CHARLES R.
THIS DOCUMENT CONCLUDES THAT INSTRUCTION BY COMPUTER-BASED RESOURCE UNITS CAN FACILITATE LEARNING AND PROVIDE THE INSTRUCTOR WITH VALUABLE ASSISTANCE. BY PRE-PLANNING THE TEACHING-LEARNING SITUATION, RESOURCE UNITS CAN FREE THE INSTRUCTOR FOR DECISION-MAKING TASKS. RESOURCE UNITS CAN ALSO PROVIDE APPROPRIATE LEARNING GOALS AND STUDY GUIDES TO EACH…
Information management system breadboard data acquisition and control system.
NASA Technical Reports Server (NTRS)
Mallary, W. E.
1972-01-01
Description of a breadboard configuration of an advanced information management system based on requirements for high data rates and local and centralized computation for subsystems and experiments to be housed on a space station. The system is to contain a 10-megabit-per-second digital data bus, remote terminals with preprocessor capabilities, and a central multiprocessor. A concept definition is presented for the data acquisition and control system breadboard, and a detailed account is given of the operation of the bus control unit, the bus itself, and the remote acquisition and control unit. The data bus control unit is capable of operating under control of both its own test panel and the test processor. In either mode it is capable of both single- and multiple-message operation in that it can accept a block of data requests or update commands for transmission to the remote acquisition and control unit, which in turn is capable of three levels of data-handling complexity.
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
A novel surrogate-based approach for optimal design of electromagnetic-based circuits
NASA Astrophysics Data System (ADS)
Hassan, Abdel-Karim S. O.; Mohamed, Ahmed S. A.; Rabie, Azza A.; Etman, Ahmed S.
2016-02-01
A new geometric design centring approach for optimal design of central processing unit-intensive electromagnetic (EM)-based circuits is introduced. The approach uses norms related to the probability distribution of the circuit parameters to find distances from a point to the feasible region boundaries by solving nonlinear optimization problems. Based on these normed distances, the design centring problem is formulated as a max-min optimization problem. A convergent iterative boundary search technique is exploited to find the normed distances. To alleviate the computation cost associated with the EM-based circuits design cycle, space-mapping (SM) surrogates are used to create a sequence of iteratively updated feasible region approximations. In each SM feasible region approximation, the centring process using normed distances is implemented, leading to a better centre point. The process is repeated until a final design centre is attained. Practical examples are given to show the effectiveness of the new design centring method for EM-based circuits.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Feng; Ren, Yinghui; Bian, Wensheng, E-mail: bian@iccas.ac.cn
The accurate time-independent quantum dynamics calculations on the ground-state tunneling splitting of malonaldehyde in full dimensionality are reported for the first time. This is achieved with an efficient method developed by us. In our method, the basis functions are customized for the hydrogen transfer process which has the effect of greatly reducing the size of the final Hamiltonian matrix, and the Lanczos method and parallel strategy are used to further overcome the memory and central processing unit time bottlenecks. The obtained ground-state tunneling splitting of 24.5 cm{sup −1} is in excellent agreement with the benchmark value of 23.8 cm{sup −1}more » computed with the full-dimensional, multi-configurational time-dependent Hartree approach on the same potential energy surface, and we estimate that our reported value has an uncertainty of less than 0.5 cm{sup −1}. Moreover, the role of various vibrational modes strongly coupled to the hydrogen transfer process is revealed.« less
All about Reading and Technology.
ERIC Educational Resources Information Center
Karbal, Harold, Ed.
1985-01-01
The central theme in this journal issue is the use of the computer in teaching reading. The following articles are included: "The Use of Computers in the Reading Program: A District Approach" by Nora Forester; "Reading and Computers: A Partnership" by Dr. Martha Irwin; "Rom, Ram and Reason" by Candice Carlile; "Word Processing: Practical Ideas and…
2010-01-01
Background Simulation of sophisticated biological models requires considerable computational power. These models typically integrate together numerous biological phenomena such as spatially-explicit heterogeneous cells, cell-cell interactions, cell-environment interactions and intracellular gene networks. The recent advent of programming for graphical processing units (GPU) opens up the possibility of developing more integrative, detailed and predictive biological models while at the same time decreasing the computational cost to simulate those models. Results We construct a 3D model of epidermal development and provide a set of GPU algorithms that executes significantly faster than sequential central processing unit (CPU) code. We provide a parallel implementation of the subcellular element method for individual cells residing in a lattice-free spatial environment. Each cell in our epidermal model includes an internal gene network, which integrates cellular interaction of Notch signaling together with environmental interaction of basement membrane adhesion, to specify cellular state and behaviors such as growth and division. We take a pedagogical approach to describing how modeling methods are efficiently implemented on the GPU including memory layout of data structures and functional decomposition. We discuss various programmatic issues and provide a set of design guidelines for GPU programming that are instructive to avoid common pitfalls as well as to extract performance from the GPU architecture. Conclusions We demonstrate that GPU algorithms represent a significant technological advance for the simulation of complex biological models. We further demonstrate with our epidermal model that the integration of multiple complex modeling methods for heterogeneous multicellular biological processes is both feasible and computationally tractable using this new technology. We hope that the provided algorithms and source code will be a starting point for modelers to develop their own GPU implementations, and encourage others to implement their modeling methods on the GPU and to make that code available to the wider community. PMID:20696053
Platform Architecture for Decentralized Positioning Systems.
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-04-26
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system.
Platform Architecture for Decentralized Positioning Systems
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-01-01
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system. PMID:28445414
NASA Astrophysics Data System (ADS)
Lu, Bin; Cheng, Xiaomin; Feng, Jinlong; Guan, Xiawei; Miao, Xiangshui
2016-07-01
Nonvolatile memory devices or circuits that can implement both storage and calculation are a crucial requirement for the efficiency improvement of modern computer. In this work, we realize logic functions by using [GeTe/Sb2Te3]n super lattice phase change memory (PCM) cell in which higher threshold voltage is needed for phase change with a magnetic field applied. First, the [GeTe/Sb2Te3]n super lattice cells were fabricated and the R-V curve was measured. Then we designed the logic circuits with the super lattice PCM cell verified by HSPICE simulation and experiments. Seven basic logic functions are first demonstrated in this letter; then several multi-input logic gates are presented. The proposed logic devices offer the advantages of simple structures and low power consumption, indicating that the super lattice PCM has the potential in the future nonvolatile central processing unit design, facilitating the development of massive parallel computing architecture.
Monte Carlo MP2 on Many Graphical Processing Units.
Doran, Alexander E; Hirata, So
2016-10-11
In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n 3 ) or better with system size n, which may be compared with the O(n 5 ) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.
Computing at DESY — current setup, trends and strategic directions
NASA Astrophysics Data System (ADS)
Ernst, Michael
1998-05-01
Since the HERA experiments H1 and ZEUS started data taking in '92, the computing environment at DESY has changed dramatically. Running a mainframe centred computing for more than 20 years, DESY switched to a heterogeneous, fully distributed computing environment within only about two years in almost every corner where computing has its applications. The computing strategy was highly influenced by the needs of the user community. The collaborations are usually limited by current technology and their ever increasing demands is the driving force for central computing to always move close to the technology edge. While DESY's central computing has a multidecade experience in running Central Data Recording/Central Data Processing for HEP experiments, the most challenging task today is to provide for clear and homogeneous concepts in the desktop area. Given that lowest level commodity hardware draws more and more attention, combined with the financial constraints we are facing already today, we quickly need concepts for integrated support of a versatile device which has the potential to move into basically any computing area in HEP. Though commercial solutions, especially addressing the PC management/support issues, are expected to come to market in the next 2-3 years, we need to provide for suitable solutions now. Buying PC's at DESY currently at a rate of about 30/month will otherwise absorb any available manpower in central computing and still will leave hundreds of unhappy people alone. Though certainly not the only region, the desktop issue is one of the most important one where we need HEP-wide collaboration to a large extent, and right now. Taking into account that there is traditionally no room for R&D at DESY, collaboration, meaning sharing experience and development resources within the HEP community, is a predominant factor for us.
Numericware i: Identical by State Matrix Calculator
Kim, Bongsong; Beavis, William D
2017-01-01
We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db. PMID:28469375
Managing internode data communications for an uninitialized process in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior tomore » initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.« less
Managing internode data communications for an uninitialized process in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Vigmond, Edward J.; Boyle, Patrick M.; Leon, L. Joshua; Plank, Gernot
2014-01-01
Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time. PMID:19964295
Tsiper, E V
2006-08-18
The concept of fractional charge is central to the theory of the fractional quantum Hall effect. Here I use exact diagonalization as well as configuration space renormalization to study finite clusters which are large enough to contain two independent edges. I analyze the conditions of resonant tunneling between the two edges. The "computer experiment" reveals a periodic sequence of resonant tunneling events consistent with the experimentally observed fractional quantization of electric charge in units of e/3 and e/5.
gpuPOM: a GPU-based Princeton Ocean Model
NASA Astrophysics Data System (ADS)
Xu, S.; Huang, X.; Zhang, Y.; Fu, H.; Oey, L.-Y.; Xu, F.; Yang, G.
2014-11-01
Rapid advances in the performance of the graphics processing unit (GPU) have made the GPU a compelling solution for a series of scientific applications. However, most existing GPU acceleration works for climate models are doing partial code porting for certain hot spots, and can only achieve limited speedup for the entire model. In this work, we take the mpiPOM (a parallel version of the Princeton Ocean Model) as our starting point, design and implement a GPU-based Princeton Ocean Model. By carefully considering the architectural features of the state-of-the-art GPU devices, we rewrite the full mpiPOM model from the original Fortran version into a new Compute Unified Device Architecture C (CUDA-C) version. We take several accelerating methods to further improve the performance of gpuPOM, including optimizing memory access in a single GPU, overlapping communication and boundary operations among multiple GPUs, and overlapping input/output (I/O) between the hybrid Central Processing Unit (CPU) and the GPU. Our experimental results indicate that the performance of the gpuPOM on a workstation containing 4 GPUs is comparable to a powerful cluster with 408 CPU cores and it reduces the energy consumption by 6.8 times.
A Distributed GPU-Based Framework for Real-Time 3D Volume Rendering of Large Astronomical Data Cubes
NASA Astrophysics Data System (ADS)
Hassan, A. H.; Fluke, C. J.; Barnes, D. G.
2012-05-01
We present a framework to volume-render three-dimensional data cubes interactively using distributed ray-casting and volume-bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core central processing unit (CPU). The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of terabyte-sized data cubes. We tested the presented framework using a computing cluster comprising 64 nodes with a total of 128GPUs. The framework proved to be scalable to render a 204GB data cube with an average of 30 frames per second. Our performance analyses also compare the use of NVIDIA Tesla 1060 and 2050GPU architectures and the effect of increasing the visualization output resolution on the rendering performance. Although our initial focus, as shown in the examples presented in this work, is volume rendering of spectral data cubes from radio astronomy, we contend that our approach has applicability to other disciplines where close to real-time volume rendering of terabyte-order three-dimensional data sets is a requirement.
Navier-Stokes Simulation of Airconditioning Facility of a Large Modem Computer Room
NASA Technical Reports Server (NTRS)
2005-01-01
NASA recently assembled one of the world's fastest operational supercomputers to meet the agency's new high performance computing needs. This large-scale system, named Columbia, consists of 20 interconnected SGI Altix 512-processor systems, for a total of 10,240 Intel Itanium-2 processors. High-fidelity CFD simulations were performed for the NASA Advanced Supercomputing (NAS) computer room at Ames Research Center. The purpose of the simulations was to assess the adequacy of the existing air handling and conditioning system and make recommendations for changes in the design of the system if needed. The simulations were performed with NASA's OVERFLOW-2 CFD code which utilizes overset structured grids. A new set of boundary conditions were developed and added to the flow solver for modeling the roomls air-conditioning and proper cooling of the equipment. Boundary condition parameters for the flow solver are based on cooler CFM (flow rate) ratings and some reasonable assumptions of flow and heat transfer data for the floor and central processing units (CPU) . The geometry modeling from blue prints and grid generation were handled by the NASA Ames software package Chimera Grid Tools (CGT). This geometric model was developed as a CGT-scripted template, which can be easily modified to accommodate any changes in shape and size of the room, locations and dimensions of the CPU racks, disk racks, coolers, power distribution units, and mass-storage system. The compute nodes are grouped in pairs of racks with an aisle in the middle. High-speed connection cables connect the racks with overhead cable trays. The cool air from the cooling units is pumped into the computer room from a sub-floor through perforated floor tiles. The CPU cooling fans draw cool air from the floor tiles, which run along the outside length of each rack, and eject warm air into the center isle between the racks. This warm air is eventually drawn into the cooling units located near the walls of the room. One major concern is that the hot air ejected to the middle isle might recirculate back into the cool rack side and cause thermal short-cycling. The simulations analyzed and addressed the following important elements of the computer room: 1) High-temperature build-up in certain regions of the room; 2) Areas of low air circulation in the room; 3) Potential short-cycling of the computer rack cooling system; 4) Effectiveness of the perforated cooling floor tiles; 5) Effect of changes in various aspects of the cooling units. Detailed flow visualization is performed to show temperature distribution, air-flow streamlines and velocities in the computer room.
GPU computing in medical physics: a review.
Pratx, Guillem; Xing, Lei
2011-05-01
The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
Correlating Computer Database Programs with Social Studies Instruction.
ERIC Educational Resources Information Center
Northwest Regional Educational Lab., Portland, OR.
This unit emphasizes the integration of software in a focus on the classroom instruction process. Student activities are based on plans and ideas for instructional units presented by a teacher who describes and demonstrates the activities. Integration has occurred when computer applications are included in an instructional activity. This guide…
An ultrasound wearable system for the monitoring and acceleration of fracture healing in long bones.
Protopappas, Vasilios C; Baga, Dina A; Fotiadis, Dimitrios I; Likas, Aristidis C; Papachristos, Athanasios A; Malizos, Konstantinos N
2005-09-01
An ultrasound wearable system for remote monitoring and acceleration of the healing process in fractured long bones is presented. The so-called USBone system consists of a pair of ultrasound transducers, implanted into the fracture region, a wearable device and a centralized unit. The wearable device is responsible to carry out ultrasound measurements using the axial-transmission technique and initiate therapy sessions of low-intensity pulsed ultrasound. The acquired measurements and other data are wirelessly transferred from the patient-site to the centralized unit, which is located in a clinical setting. The evaluation of the system on an animal tibial osteotomy model is also presented. A dataset was constructed for monitoring purposes consisting of serial ultrasound measurements, follow-up radiographs, quantitative computed tomography-based densitometry and biomechanical data. The animal study demonstrated the ability of the system to collect ultrasound measurements in an effective and reliable fashion and participating orthopaedic surgeons accepted the system for future clinical application. Analysis of the acquired measurements showed that the pattern of evolution of the ultrasound velocity through healing bones over the postoperative period monitors a dynamic healing process. Furthermore, the ultrasound velocity of radiographically healed bones returns to 80% of the intact bone value, whereas the correlation coefficient of the velocity with the material and mechanical properties of the healing bone ranges from 0.699 to 0.814. The USBone system constitutes the first telemedicine system for the out-hospital management of patients sustained open fractures and treated with external fixation devices.
Computer program developed for flowsheet calculations and process data reduction
NASA Technical Reports Server (NTRS)
Alfredson, P. G.; Anastasia, L. J.; Knudsen, I. E.; Koppel, L. B.; Vogel, G. J.
1969-01-01
Computer program PACER-65, is used for flowsheet calculations and easily adapted to process data reduction. Each unit, vessel, meter, and processing operation in the overall flowsheet is represented by a separate subroutine, which the program calls in the order required to complete an overall flowsheet calculation.
Misdiagnosis of acute peripheral vestibulopathy in central nervous ischemic infarction.
Braun, Eva Maria; Tomazic, Peter Valentin; Ropposch, Thorsten; Nemetz, Ulrike; Lackner, Andreas; Walch, Christian
2011-12-01
Vertigo is a very common symptom at otorhinolaryngology (ENT), neurological, and emergency units, but often, it is difficult to distinguish between vertigo of peripheral and central origin. We conducted a retrospective analysis of a hospital database, including all patients admitted to the ENT University Hospital Graz after neurological examination, with a diagnosis of peripheral vestibular vertigo and subsequent diagnosis of central nervous infarction as the actual cause for the vertigo. Twelve patients were included in this study. All patients with acute spinning vertigo after a thorough neurological examination and with uneventful computed tomographic scans were referred to our ENT department. Nine of them presented with horizontal nystagmus. Only 1 woman experienced additional hearing loss. The mean diagnostic delay to the definite diagnosis of a central infarction through magnetic resonance imaging was 4 days (SD, 2.3 d). A careful otologic and neurological examination, including the head impulse test and caloric testing, is mandatory. Because ischemic events cannot be diagnosed in computed tomographic scans at an early stage, we strongly recommend to perform cranial magnetic resonance imaging within 48 hours from admission if vertigo has not improved under conservative treatment.
Different CAD/CAM-processing routes for zirconia restorations: influence on fitting accuracy.
Kohorst, Philipp; Junghanns, Janet; Dittmer, Marc P; Borchers, Lothar; Stiesch, Meike
2011-08-01
The aim of the present in vitro study was to evaluate the influence of different processing routes on the fitting accuracy of four-unit zirconia fixed dental prostheses (FDPs) fabricated by computer-aided design/computer-aided manufacturing (CAD/CAM). Three groups of zirconia frameworks with ten specimens each were fabricated. Frameworks of one group (CerconCAM) were produced by means of a laboratory CAM-only system. The other frameworks were made with different CAD/CAM systems; on the one hand by in-laboratory production (CerconCAD/CAM) and on the other hand by centralized production in a milling center (Compartis) after forwarding geometrical data. Frameworks were then veneered with the recommended ceramics, and marginal accuracy was determined using a replica technique. Horizontal marginal discrepancy, vertical marginal discrepancy, absolute marginal discrepancy, and marginal gap were evaluated. Statistical analyses were performed by one-way analysis of variance (ANOVA), with the level of significance chosen at 0.05. Mean horizontal discrepancies ranged between 22 μm (CerconCAM) and 58 μm (Compartis), vertical discrepancies ranged between 63 μm (CerconCAD/CAM) and 162 μm (CerconCAM), and absolute marginal discrepancies ranged between 94 μm (CerconCAD/CAM) and 181 μm (CerconCAM). The marginal gap varied between 72 μm (CerconCAD/CAM) and 112 μm (CerconCAM, Compartis). Statistical analysis revealed that, with all measurements, the marginal accuracy of the zirconia FDPs was significantly influenced by the processing route used (p < 0.05). Within the limitations of this study, all restorations showed a clinically acceptable marginal accuracy; however, the results suggest that the CAD/CAM systems are more precise than the CAM-only system for the manufacture of four-unit FDPs.
NASA Astrophysics Data System (ADS)
Hua, H.; Owen, S. E.; Yun, S. H.; Agram, P. S.; Manipon, G.; Starch, M.; Sacco, G. F.; Bue, B. D.; Dang, L. B.; Linick, J. P.; Malarout, N.; Rosen, P. A.; Fielding, E. J.; Lundgren, P.; Moore, A. W.; Liu, Z.; Farr, T.; Webb, F.; Simons, M.; Gurrola, E. M.
2017-12-01
With the increased availability of open SAR data (e.g. Sentinel-1 A/B), new challenges are being faced with processing and analyzing the voluminous SAR datasets to make geodetic measurements. Upcoming SAR missions such as NISAR are expected to generate close to 100TB per day. The Advanced Rapid Imaging and Analysis (ARIA) project can now generate geocoded unwrapped phase and coherence products from Sentinel-1 TOPS mode data in an automated fashion, using the ISCE software. This capability is currently being exercised on various study sites across the United States and around the globe, including Hawaii, Central California, Iceland and South America. The automated and large-scale SAR data processing and analysis capabilities use cloud computing techniques to speed the computations and provide scalable processing power and storage. Aspects such as how to processing these voluminous SLCs and interferograms at global scales, keeping up with the large daily SAR data volumes, and how to handle the voluminous data rates are being explored. Scene-partitioning approaches in the processing pipeline help in handling global-scale processing up to unwrapped interferograms with stitching done at a late stage. We have built an advanced science data system with rapid search functions to enable access to the derived data products. Rapid image processing of Sentinel-1 data to interferograms and time series is already being applied to natural hazards including earthquakes, floods, volcanic eruptions, and land subsidence due to fluid withdrawal. We will present the status of the ARIA science data system for generating science-ready data products and challenges that arise from being able to process SAR datasets to derived time series data products at large scales. For example, how do we perform large-scale data quality screening on interferograms? What approaches can be used to minimize compute, storage, and data movement costs for time series analysis in the cloud? We will also present some of our findings from applying machine learning and data analytics on the processed SAR data streams. We will also present lessons learned on how to ease the SAR community onto interfacing with these cloud-based SAR science data systems.
ERIC Educational Resources Information Center
Rothbaum, Fred; Pott, Martha; Azuma, Hiroshi; Miyake, Kazuo; Weisz, John
2000-01-01
Compares paths of development in Japan (symbiotic harmony) and the United States (generative tension) of parent-child and adult mate relationships, challenging assumptions that certain processes are central in all relationships or that U.S. relationships are less valued or weaker than Japan's. Suggests need to investigate processes underlying, and…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, J.; Mowrey, J.
1995-12-01
This report describes the design, development and testing of process controls for selected system operations in the Browns Ferry Nuclear Plant (BFNP) Reactor Water Cleanup System (RWCU) using a Computer Simulation Platform which simulates the RWCU System and the BFNP Integrated Computer System (ICS). This system was designed to demonstrate the feasibility of the soft control (video touch screen) of nuclear plant systems through an operator console. The BFNP Integrated Computer System, which has recently. been installed at BFNP Unit 2, was simulated to allow for operator control functions of the modeled RWCU system. The BFNP Unit 2 RWCU systemmore » was simulated using the RELAP5 Thermal/Hydraulic Simulation Model, which provided the steady-state and transient RWCU process variables and simulated the response of the system to control system inputs. Descriptions of the hardware and software developed are also included in this report. The testing and acceptance program and results are also detailed in this report. A discussion of potential installation of an actual RWCU process control system in BFNP Unit 2 is included. Finally, this report contains a section on industry issues associated with installation of process control systems in nuclear power plants.« less
Karimi, Davood; Ward, Rabab K
2016-10-01
Image models are central to all image processing tasks. The great advancements in digital image processing would not have been made possible without powerful models which, themselves, have evolved over time. In the past decade, "patch-based" models have emerged as one of the most effective models for natural images. Patch-based methods have outperformed other competing methods in many image processing tasks. These developments have come at a time when greater availability of powerful computational resources and growing concerns over the health risks of the ionizing radiation encourage research on image processing algorithms for computed tomography (CT). The goal of this paper is to explain the principles of patch-based methods and to review some of their recent applications in CT. We first review the central concepts in patch-based image processing and explain some of the state-of-the-art algorithms, with a focus on aspects that are more relevant to CT. Then, we review some of the recent application of patch-based methods in CT. Patch-based methods have already transformed the field of image processing, leading to state-of-the-art results in many applications. More recently, several studies have proposed patch-based algorithms for various image processing tasks in CT, from denoising and restoration to iterative reconstruction. Although these studies have reported good results, the true potential of patch-based methods for CT has not been yet appreciated. Patch-based methods can play a central role in image reconstruction and processing for CT. They have the potential to lead to substantial improvements in the current state of the art.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Code of Federal Regulations, 2010 CFR
2010-07-01
... Computer Reader finalization costs, cost per image, and Remote Bar Code Sorter leakage; (8) Percentage of... processing units costs for Carrier Route, High Density, and Saturation mail; (j) Mail processing unit costs...
Code of Federal Regulations, 2011 CFR
2011-07-01
... Computer Reader finalization costs, cost per image, and Remote Bar Code Sorter leakage; (8) Percentage of... processing units costs for Carrier Route, High Density, and Saturation mail; (j) Mail processing unit costs...
Code of Federal Regulations, 2013 CFR
2013-07-01
... Computer Reader finalization costs, cost per image, and Remote Bar Code Sorter leakage; (8) Percentage of... processing units costs for Carrier Route, High Density, and Saturation mail; (j) Mail processing unit costs...
Code of Federal Regulations, 2014 CFR
2014-07-01
... Computer Reader finalization costs, cost per image, and Remote Bar Code Sorter leakage; (8) Percentage of... processing units costs for Carrier Route, High Density, and Saturation mail; (j) Mail processing unit costs...
Code of Federal Regulations, 2012 CFR
2012-07-01
... Computer Reader finalization costs, cost per image, and Remote Bar Code Sorter leakage; (8) Percentage of... processing units costs for Carrier Route, High Density, and Saturation mail; (j) Mail processing unit costs...
SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moriya, S; Sato, M; Tachibana, H
Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation runningmore » on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.« less
A Method for Growing Bio-memristors from Slime Mold
Miranda, Eduardo Reck; Braund, Edward
2017-01-01
Our research is aimed at gaining a better understanding of the electronic properties of organisms in order to engineer novel bioelectronic systems and computing architectures based on biology. This specific paper focuses on harnessing the unicellular slime mold Physarum polycephalum to develop bio-memristors (or biological memristors) and bio-computing devices. The memristor is a resistor that possesses memory. It is the 4th fundamental passive circuit element (the other three are the resistor, the capacitor, and the inductor), which is paving the way for the design of new kinds of computing systems; e.g., computers that might relinquish the distinction between storage and a central processing unit. When applied with an AC voltage, the current vs. voltage characteristic of a memristor is a pinched hysteresis loop. It has been shown that P. polycephalum produces pinched hysteresis loops under AC voltages and displays adaptive behavior that is comparable with the functioning of a memristor. This paper presents the method that we developed for implementing bio-memristors with P. polycephalum and introduces the development of a receptacle to culture the organism, which facilitates its deployment as an electronic circuit component. Our method has proven to decrease growth time, increase component lifespan, and standardize electrical observations. PMID:29155754
2011-01-01
Background The cleaning stage of the instrument decontamination process has come under increased scrutiny due to the increasing complexity of surgical instruments and the adverse affects of residual protein contamination on surgical instruments. Instruments used in the podiatry field have a complex surface topography and are exposed to a wide range of biological contamination. Currently, podiatry instruments are reprocessed locally within surgeries while national strategies are favouring a move toward reprocessing in central facilities. The aim of this study was to determine the efficacy of local and central reprocessing on podiatry instruments by measuring residual protein contamination of instruments reprocessed by both methods. Methods The residual protein of 189 instruments reprocessed centrally and 189 instruments reprocessed locally was determined using a fluorescent assay based on the reaction of proteins with o-phthaldialdehyde/sodium 2-mercaptoethanesulfonate. Results Residual protein was detected on 72% (n = 136) of instruments reprocessed centrally and 90% (n = 170) of instruments reprocessed locally. Significantly less protein (p < 0.001) was recovered from instruments reprocessed centrally (median 20.62 μg, range 0 - 5705 μg) than local reprocessing (median 111.9 μg, range 0 - 6344 μg). Conclusions Overall, the results show the superiority of central reprocessing for complex podiatry instruments when protein contamination is considered, though no significant difference was found in residual protein between local decontamination unit and central decontamination unit processes for Blacks files. Further research is needed to undertake qualitative identification of protein contamination to identify any cross contamination risks and a standard for acceptable residual protein contamination applicable to different instruments and specialities should be considered as a matter of urgency. PMID:21219613
Smith, Gordon Wg; Goldie, Frank; Long, Steven; Lappin, David F; Ramage, Gordon; Smith, Andrew J
2011-01-10
The cleaning stage of the instrument decontamination process has come under increased scrutiny due to the increasing complexity of surgical instruments and the adverse affects of residual protein contamination on surgical instruments. Instruments used in the podiatry field have a complex surface topography and are exposed to a wide range of biological contamination. Currently, podiatry instruments are reprocessed locally within surgeries while national strategies are favouring a move toward reprocessing in central facilities. The aim of this study was to determine the efficacy of local and central reprocessing on podiatry instruments by measuring residual protein contamination of instruments reprocessed by both methods. The residual protein of 189 instruments reprocessed centrally and 189 instruments reprocessed locally was determined using a fluorescent assay based on the reaction of proteins with o-phthaldialdehyde/sodium 2-mercaptoethanesulfonate. Residual protein was detected on 72% (n = 136) of instruments reprocessed centrally and 90% (n = 170) of instruments reprocessed locally. Significantly less protein (p < 0.001) was recovered from instruments reprocessed centrally (median 20.62 μg, range 0 - 5705 μg) than local reprocessing (median 111.9 μg, range 0 - 6344 μg). Overall, the results show the superiority of central reprocessing for complex podiatry instruments when protein contamination is considered, though no significant difference was found in residual protein between local decontamination unit and central decontamination unit processes for Blacks files. Further research is needed to undertake qualitative identification of protein contamination to identify any cross contamination risks and a standard for acceptable residual protein contamination applicable to different instruments and specialities should be considered as a matter of urgency.
Computer and photogrammetric general land use study of central north Alabama
NASA Technical Reports Server (NTRS)
Jayroe, R. R.; Larsen, P. A.; Campbell, C. W.
1974-01-01
The object of this report is to acquaint potential users with two computer programs, developed at NASA, Marshall Space Flight Center. They were used in producing a land use survey and maps of central north Alabama from Earth Resources Technology Satellite (ERTS) digital data. The report describes in detail the thought processes and analysis procedures used from the initiation of the land use study to its completion, as well as a photogrammetric study that was used in conjunction with the computer analysis to produce similar land use maps. The results of the land use demonstration indicate that, with respect to computer time and cost, such a study may be economically and realistically feasible on a statewide basis.
CFD Extraction Tool for TecPlot From DPLR Solutions
NASA Technical Reports Server (NTRS)
Norman, David
2013-01-01
This invention is a TecPlot macro of a computer program in the TecPlot programming language that processes data from DPLR solutions in TecPlot format. DPLR (Data-Parallel Line Relaxation) is a NASA computational fluid dynamics (CFD) code, and TecPlot is a commercial CFD post-processing tool. The Tec- Plot data is in SI units (same as DPLR output). The invention converts the SI units into British units. The macro modifies the TecPlot data with unit conversions, and adds some extra calculations. After unit conversions, the macro cuts a slice, and adds vectors on the current plot for output format. The macro can also process surface solutions. Existing solutions use manual conversion and superposition. The conversion is complicated because it must be applied to a range of inter-related scalars and vectors to describe a 2D or 3D flow field. It processes the CFD solution to create superposition/comparison of scalars and vectors. The existing manual solution is cumbersome, open to errors, slow, and cannot be inserted into an automated process. This invention is quick and easy to use, and can be inserted into an automated data-processing algorithm.
Preliminary development of digital signal processing in microwave radiometers
NASA Technical Reports Server (NTRS)
Stanley, W. D.
1980-01-01
Topics covered involve a number of closely related tasks including: the development of several control loop and dynamic noise model computer programs for simulating microwave radiometer measurements; computer modeling of an existing stepped frequency radiometer in an effort to determine its optimum operational characteristics; investigation of the classical second order analog control loop to determine its ability to reduce the estimation error in a microwave radiometer; investigation of several digital signal processing unit designs; initiation of efforts to develop required hardware and software for implementation of the digital signal processing unit; and investigation of the general characteristics and peculiarities of digital processing noiselike microwave radiometer signals.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-08
... Communication Devices, Portable Music and Data Processing Devices, Computers and Components Thereof; Notice of... within the United States after importation of certain wireless communication devices, portable music and... music and data processing devices, computers and components thereof that infringe one or more of claim...
DOD Hotline Allegations on Army Use of A Computer Contract
1993-10-29
Army, the Navy, and the Defense Logistics Agency central order processing offices and reviewed delivery orders issued on the EDS contract. A...o The contracting officers used the EDS contract line item number and the description when completing a delivery order. o The central order ... processing offices used an automated data base system to match contract line item numbers from the delivery orders to the EDS contract. o EDS verified that
General purpose molecular dynamics simulations fully implemented on graphics processing units
NASA Astrophysics Data System (ADS)
Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.
2008-05-01
Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)
ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...
Using Mathematica to Teach Process Units: A Distillation Case Study
ERIC Educational Resources Information Center
Rasteiro, Maria G.; Bernardo, Fernando P.; Saraiva, Pedro M.
2005-01-01
The question addressed here is how to integrate computational tools, namely interactive general-purpose platforms, in the teaching of process units. Mathematica has been selected as a complementary tool to teach distillation processes, with the main objective of leading students to achieve a better understanding of the physical phenomena involved…
NASA Astrophysics Data System (ADS)
Trifonov, N. N.; Sukhorukov, Yu. G.; Ermolov, V. F.; Svyatkin, F. A.; Nikolaenkova, E. K.; Sintsova, T. G.; Grigor'eva, E. B.; Esin, S. B.; Ukhanova, M. G.; Golubev, E. A.; Bik, S. P.; Tren'kin, V. B.
2014-06-01
The equipment of the Kalinin NPP Unit 4 regeneration, intermediate separation, and steam reheating (ISSR) systems is described and the results of their static and dynamic tests are presented. It was shown from an analysis of test results that the equipment of the regeneration and ISSR systems produce the design thermal and hydraulic characteristics in static and dynamic modes of its operation. Specialists of the Central boiler-Turbine Institute Research and Production Association have developed procedures and computer programs for calculating the system of direct-contact horizontal low-pressure heaters (connected according to the gravity circuit arrangement jointly with the second-stage electrically-driven condensate pumps) and the ISSR system, the results of which are in satisfactory agreement with experimental data. The drawbacks of the layout solutions due to which cavitation failure of the pumps may occur are considered. Technical solutions aimed at securing stable operation of the equipment of regeneration and ISSR systems are proposed. The process arrangement for heating the chamber-type high-pressure heaters adopted at the Kalinin NPP is analyzed. The version of this circuit developed at the Central Boiler-Turbine Institute Research and Production Association that allows the heating rate equal to 1°C/min to be obtained is proposed.
Graphics Processing Units for HEP trigger systems
NASA Astrophysics Data System (ADS)
Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.; Neri, I.; Paolucci, P. S.; Piandani, R.; Pontisso, L.; Rescigno, M.; Simula, F.; Sozzi, M.; Vicini, P.
2016-07-01
General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.
Kussmann, Jörg; Ochsenfeld, Christian
2017-06-13
We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cantu Cantu, David; McGrail, B. Peter; Glezakou, Vassiliki Alexandra
2014-09-18
Based on density functional theory calculations and simulation, a detailed mechanism is presented on the formation of the secondary building unit (SBU) of MIL-101, a chromium terephthalate metal-organic framework (MOF). SBU formation is key to MOF nucleation, the rate-limiting step in the formation process of many MOFs. A series of reactions that lead to the formation of the SBU of MIL-101 is proposed in this work. Initial rate-limiting reactions form the metal cluster with three chromium (III) atoms linked to a central bridging oxygen. Terephthalate linkers play a key role as chromium (III) atoms are joined to linker carboxylate groupsmore » prior to the placement of the central bridging oxygen. Multiple linker addition reactions, which follow in different paths due to structural isomers, are limited by the removal of water molecules in the first chromium coordination shell. The least energy path is identified were all linkers on one face of the metal center plane are added first. A simple kinetic model based on transition state theory shows the rate of secondary building unit formation similar to the rate metal-organic framework nucleation. The authors are thankful to Dr. R. Rousseau for a critical reading of the manuscript. This research would not have been possible without the support of the Office of Fossil Energy, U.S. Department of Energy. This research was performed using EMSL, a national scientific user facility sponsored by the Department of Energy's Office of Biological and Environmental Research and the PNNL Institutional Computing (PIC) program located at Pacific Northwest National Laboratory.« less
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; McDougal, Matthew; Russell, Sam
2013-01-01
Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.
The Radio Frequency Health Node Wireless Sensor System
NASA Technical Reports Server (NTRS)
Valencia, J. Emilio; Stanley, Priscilla C.; Mackey, Paul J.
2009-01-01
The Radio Frequency Health Node (RFHN) wireless sensor system differs from other wireless sensor systems in ways originally intended to enhance utility as an instrumentation system for a spacecraft. The RFHN can also be adapted to use in terrestrial applications in which there are requirements for operational flexibility and integrability into higher-level instrumentation and data acquisition systems. As shown in the figure, the heart of the system is the RFHN, which is a unit that passes commands and data between (1) one or more commercially available wireless sensor units (optionally, also including wired sensor units) and (2) command and data interfaces with a local control computer that may be part of the spacecraft or other engineering system in which the wireless sensor system is installed. In turn, the local control computer can be in radio or wire communication with a remote control computer that may be part of a higher-level system. The remote control computer, acting via the local control computer and the RFHN, cannot only monitor readout data from the sensor units but can also remotely configure (program or reprogram) the RFHN and the sensor units during operation. In a spacecraft application, the RFHN and the sensor units can also be configured more nearly directly, prior to launch, via a serial interface that includes an umbilical cable between the spacecraft and ground support equipment. In either case, the RFHN wireless sensor system has the flexibility to be configured, as required, with different numbers and types of sensors for different applications. The RFHN can be used to effect realtime transfer of data from, and commands to, the wireless sensor units. It can also store data for later retrieval by an external computer. The RFHN communicates with the wireless sensor units via a radio transceiver module. The modular design of the RFHN makes it possible to add radio transceiver modules as needed to accommodate additional sets of wireless sensor units. The RFHN includes a core module that performs generic computer functions, including management of power and input, output, processing, and storage of data. In a typical application, the processing capabilities in the RFHN are utilized to perform preprocessing, trending, and fusion of sensor data. The core module also serves as the unit through which the remote control computer configures the sensor units and the rest of the RFHN.
Souris, Kevin; Lee, John Aldo; Sterpin, Edmond
2016-04-01
Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Low-profile heliostat design for solar central receiver systems
NASA Technical Reports Server (NTRS)
Fourakis, E.; Severson, A. M.
1977-01-01
Heliostat designs intended to reduce costs and the effect of adverse wind loads on the devices were developed. Included was the low-profile heliostat consisting of a stiff frame with sectional focusing reflectors coupled together to turn as a unit. The entire frame is arranged to turn angularly about a center point. The ability of the heliostat to rotate about both the vertical and horizontal axes permits a central computer control system to continuously aim the sun's reflection onto a selected target. An engineering model of the basic device was built and is being tested. Control and mirror parameters, such as roughness and need for fine aiming, are being studied. The fabrication of these prototypes is in process. The model was also designed to test mirror focusing techniques, heliostat geometry, mechanical functioning, and tracking control. The model can be easily relocated to test mirror imaging on a tower from various directions. In addition to steering and aiming studies, the tests include the effects of temperature changes, wind gusting and weathering. The results of economic studies on this heliostat are also presented.
Recalculation of dose for each fraction of treatment on TomoTherapy.
Thomas, Simon J; Romanchikova, Marina; Harrison, Karl; Parker, Michael A; Bates, Amy M; Scaife, Jessica E; Sutcliffe, Michael P F; Burnet, Neil G
2016-01-01
The VoxTox study, linking delivered dose to toxicity requires recalculation of typically 20-37 fractions per patient, for nearly 2000 patients. This requires a non-interactive interface permitting batch calculation with multiple computers. Data are extracted from the TomoTherapy(®) archive and processed using the computational task-management system GANGA. Doses are calculated for each fraction of radiotherapy using the daily megavoltage (MV) CT images. The calculated dose cube is saved as a digital imaging and communications in medicine RTDOSE object, which can then be read by utilities that calculate dose-volume histograms or dose surface maps. The rectum is delineated on daily MV images using an implementation of the Chan-Vese algorithm. On a cluster of up to 117 central processing units, dose cubes for all fractions of 151 patients took 12 days to calculate. Outlining the rectum on all slices and fractions on 151 patients took 7 h. We also present results of the Hounsfield unit (HU) calibration of TomoTherapy MV images, measured over an 8-year period, showing that the HU calibration has become less variable over time, with no large changes observed after 2011. We have developed a system for automatic dose recalculation of TomoTherapy dose distributions. This does not tie up the clinically needed planning system but can be run on a cluster of independent machines, enabling recalculation of delivered dose without user intervention. The use of a task management system for automation of dose calculation and outlining enables work to be scaled up to the level required for large studies.
Persoon, Lucas C G G; Podesta, Mark; van Elmpt, Wouter J C; Nijsten, Sebastiaan M J J G; Verhaegen, Frank
2011-07-01
A widely accepted method to quantify differences in dose distributions is the gamma (gamma) evaluation. Currently, almost all gamma implementations utilize the central processing unit (CPU). Recently, the graphics processing unit (GPU) has become a powerful platform for specific computing tasks. In this study, we describe the implementation of a 3D gamma evaluation using a GPU to improve calculation time. The gamma evaluation algorithm was implemented on an NVIDIA Tesla C2050 GPU using the compute unified device architecture (CUDA). First, several cubic virtual phantoms were simulated. These phantoms were tested with varying dose cube sizes and set-ups, introducing artificial dose differences. Second, to show applicability in clinical practice, five patient cases have been evaluated using the 3D dose distribution from a treatment planning system as the reference and the delivered dose determined during treatment as the comparison. A calculation time comparison between the CPU and GPU was made with varying thread-block sizes including the option of using texture or global memory. A GPU over CPU speed-up of 66 +/- 12 was achieved for the virtual phantoms. For the patient cases, a speed-up of 57 +/- 15 using the GPU was obtained. A thread-block size of 16 x 16 performed best in all cases. The use of texture memory improved the total calculation time, especially when interpolation was applied. Differences between the CPU and GPU gammas were negligible. The GPU and its features, such as texture memory, decreased the calculation time for gamma evaluations considerably without loss of accuracy.
NASA Technical Reports Server (NTRS)
Matthews, Christine G.; Posenau, Mary-Anne; Leonard, Desiree M.; Avis, Elizabeth L.; Debure, Kelly R.; Stacy, Kathryn; Vonofenheim, Bill
1992-01-01
The intent is to provide an introduction to the image processing capabilities available at the Langley Research Center (LaRC) Central Scientific Computing Complex (CSCC). Various image processing software components are described. Information is given concerning the use of these components in the Data Visualization and Animation Laboratory at LaRC.
Automated Power Systems Management (APSM)
NASA Technical Reports Server (NTRS)
Bridgeforth, A. O.
1981-01-01
A breadboard power system incorporating autonomous functions of monitoring, fault detection and recovery, command and control was developed, tested and evaluated to demonstrate technology feasibility. Autonomous functions including switching of redundant power processing elements, individual load fault removal, and battery charge/discharge control were implemented by means of a distributed microcomputer system within the power subsystem. Three local microcomputers provide the monitoring, control and command function interfaces between the central power subsystem microcomputer and the power sources, power processing and power distribution elements. The central microcomputer is the interface between the local microcomputers and the spacecraft central computer or ground test equipment.
Kritchevsky, S. B.; Braun, B. I.; Wong, E. S.; Solomon, S. L.; Steele, L.; Richards, C.; Simmons, B. P.
2001-01-01
The Evaluation of Processes and Indicators in Infection Control (EPIC) study assesses the relationship between hospital care and rates of central venous catheter-associated primary bacteremia in 54 intensive-care units (ICUs) in the United States and 14 other countries. Using ICU rather than the patient as the primary unit of statistical analysis permits evaluation of factors that vary at the ICU level. The design of EPIC can serve as a template for studies investigating the relationship between process and event rates across health-care institutions. PMID:11294704
NASA Astrophysics Data System (ADS)
Chung, Shin Kee; Wen, Linqing; Blair, David; Cannon, Kipp; Datta, Amitava
2010-07-01
We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs.
Frame-dragging effect in the field of non rotating body due to unit gravimagnetic moment
NASA Astrophysics Data System (ADS)
Deriglazov, Alexei A.; Ramírez, Walberto Guzmán
2018-04-01
Nonminimal spin-gravity interaction through unit gravimagnetic moment leads to modified Mathisson-Papapetrou-Tulczyjew-Dixon equations with improved behavior in the ultrarelativistic limit. We present exact Hamiltonian of the resulting theory and compute an effective 1/c2-Hamiltonian and leading post-Newtonian corrections to the trajectory and spin. Gravimagnetic moment causes the same precession of spin S as a fictitious rotation of the central body with angular momentum J = M/m S. So the modified equations imply a number of qualitatively new effects, that could be used to test experimentally, whether a rotating body in general relativity has null or unit gravimagnetic moment.
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Evolution of Embedded Processing for Wide Area Surveillance
2014-01-01
future vision . 15. SUBJECT TERMS Embedded processing; high performance computing; general-purpose graphical processing units (GPGPUs) 16. SECURITY...recon- naissance (ISR) mission capabilities. The capabilities these advancements are achieving include the ability to provide persistent all...fighters to support and positively affect their mission . Significant improvements in high-performance computing (HPC) technology make it possible to
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
Yim, Won Cheol; Cushman, John C.
2017-07-22
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yim, Won Cheol; Cushman, John C.
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Computer-Based Training Starter Kit.
ERIC Educational Resources Information Center
Federal Interagency Group for Computer-Based Training, Washington, DC.
Intended for use by training professionals with little or no background in the application of automated data processing (ADP) systems, processes, or procurement requirements, this reference manual provides guidelines for establishing a computer based training (CBT) program within a federal agency of the United States government. The manual covers:…
Implementation of ADI: Schemes on MIMD parallel computers
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1993-01-01
In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
NASA Astrophysics Data System (ADS)
Mallakpour, Iman; Villarini, Gabriele; Jones, Michael P.; Smith, James A.
2017-08-01
The central United States is plagued by frequent catastrophic flooding, such as the flood events of 1993, 2008, 2011, 2013, 2014 and 2016. The goal of this study is to examine whether it is possible to describe the occurrence of flood and heavy precipitation events at the sub-seasonal scale in terms of variations in the climate system. Daily streamflow and precipitation time series over the central United States (defined here to include North Dakota, South Dakota, Nebraska, Kansas, Missouri, Iowa, Minnesota, Wisconsin, Illinois, West Virginia, Kentucky, Ohio, Indiana, and Michigan) are used in this study. We model the occurrence/non-occurrence of a flood and heavy precipitation event over time using regression models based on Cox processes, which can be viewed as a generalization of Poisson processes. Rather than assuming that an event (i.e., flooding or precipitation) occurs independently of the occurrence of the previous one (as in Poisson processes), Cox processes allow us to account for the potential presence of temporal clustering, which manifests itself in an alternation of quiet and active periods. Here we model the occurrence/non-occurrence of flood and heavy precipitation events using two climate indices as time-varying covariates: the Arctic Oscillation (AO) and the Pacific-North American pattern (PNA). We find that AO and/or PNA are important predictors in explaining the temporal clustering in flood occurrences in over 78% of the stream gages we considered. Similar results are obtained when working with heavy precipitation events. Analyses of the sensitivity of the results to different thresholds used to identify events lead to the same conclusions. The findings of this work highlight that variations in the climate system play a critical role in explaining the occurrence of flood and heavy precipitation events at the sub-seasonal scale over the central United States.
Efficiency and economic benefits of skipjack pole and line (huhate) in central Moluccas, Indonesia
NASA Astrophysics Data System (ADS)
Siahainenia, Stevanus M.; Hiariey, Johanis; Baskoro, Mulyono S.; Waeleruny, Wellem
2017-10-01
Excess fishing capacity is a crucial problem in marine capture fisheries. This phenomenon needed to be investigated regarding sustainability and development of the fishery. This research was aimed at analyzing technical efficiency (TE) and computing financial aspects of the skipjack pole and line. Primary data were collected from the owners of the fishing units at the different size of gross boat tonnage (GT), while secondary data were gathered from official publications relating to this research. Data envelopment analysis (DEA) approach was applied to estimate technical efficiency whereas a selected financial analysis was utilized to calculate economic benefits of the skipjack pole and line business. The fishing units with a size of 26-30 GT provided a higher TE value, and also achieved larger economic benefit values than that of the other fishing units. The empirical results indicate that skipjack pole and line in the size of 26-30 GT is a good fishing gear for the business development in central Moluccas.
Design of a modular digital computer system, CDRL no. D001, final design plan
NASA Technical Reports Server (NTRS)
Easton, R. A.
1975-01-01
The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.
Fiber Optic Communication System For Medical Images
NASA Astrophysics Data System (ADS)
Arenson, Ronald L.; Morton, Dan E.; London, Jack W.
1982-01-01
This paper discusses a fiber optic communication system linking ultrasound devices, Computerized tomography scanners, Nuclear Medicine computer system, and a digital fluoro-graphic system to a central radiology research computer. These centrally archived images are available for near instantaneous recall at various display consoles. When a suitable laser optical disk is available for mass storage, more extensive image archiving will be added to the network including digitized images of standard radiographs for comparison purposes and for remote display in such areas as the intensive care units, the operating room, and selected outpatient departments. This fiber optic system allows for a transfer of high resolution images in less than a second over distances exceeding 2,000 feet. The advantages of using fiber optic cables instead of typical parallel or serial communication techniques will be described. The switching methodology and communication protocols will also be discussed.
Visualization and recommendation of large image collections toward effective sensemaking
NASA Astrophysics Data System (ADS)
Gu, Yi; Wang, Chaoli; Nemiroff, Robert; Kao, David; Parra, Denis
2016-03-01
In our daily lives, images are among the most commonly found data which we need to handle. We present iGraph, a graph-based approach for visual analytics of large image collections and their associated text information. Given such a collection, we compute the similarity between images, the distance between texts, and the connection between image and text to construct iGraph, a compound graph representation which encodes the underlying relationships among these images and texts. To enable effective visual navigation and comprehension of iGraph with tens of thousands of nodes and hundreds of millions of edges, we present a progressive solution that offers collection overview, node comparison, and visual recommendation. Our solution not only allows users to explore the entire collection with representative images and keywords but also supports detailed comparison for understanding and intuitive guidance for navigation. The visual exploration of iGraph is further enhanced with the implementation of bubble sets to highlight group memberships of nodes, suggestion of abnormal keywords or time periods based on text outlier detection, and comparison of four different recommendation solutions. For performance speedup, multiple graphics processing units and central processing units are utilized for processing and visualization in parallel. We experiment with two image collections and leverage a cluster driving a display wall of nearly 50 million pixels. We show the effectiveness of our approach by demonstrating experimental results and conducting a user study.
Goldstone (GDSCC) administrative computing
NASA Technical Reports Server (NTRS)
Martin, H.
1981-01-01
The GDSCC Data Processing Unit provides various administrative computing services for Goldstone. Those activities, including finance, manpower and station utilization, deep-space station scheduling and engineering change order (ECO) control are discussed.
NASA Technical Reports Server (NTRS)
Reynolds, L.; Tweed, H.
1972-01-01
The work performed entailed the design, development, construction and testing of a 4000 word by 18 bit random access, NDRO plated wire memory for use in conjunction with a spacecraft imput/output unit and central processing unit. The primary design parameters, in order of importance, were high reliability, low power, volume and weight. A single memory unit, referred to as a qualification model, was delivered.
ERIC Educational Resources Information Center
Baytak, Ahmet; Land, Susan M.
2011-01-01
This study employed a case study design (Yin, "Case study research, design and methods," 2009) to investigate the processes used by 5th graders to design and develop computer games within the context of their environmental science unit, using the theoretical framework of "constructionism." Ten fifth graders designed computer games using "Scratch"…
GPU-based efficient realistic techniques for bleeding and smoke generation in surgical simulators.
Halic, Tansel; Sankaranarayanan, Ganesh; De, Suvranu
2010-12-01
In actual surgery, smoke and bleeding due to cauterization processes provide important visual cues to the surgeon, which have been proposed as factors in surgical skill assessment. While several virtual reality (VR)-based surgical simulators have incorporated the effects of bleeding and smoke generation, they are not realistic due to the requirement of real-time performance. To be interactive, visual update must be performed at at least 30 Hz and haptic (touch) information must be refreshed at 1 kHz. Simulation of smoke and bleeding is, therefore, either ignored or simulated using highly simplified techniques, since other computationally intensive processes compete for the available Central Processing Unit (CPU) resources. In this study we developed a novel low-cost method to generate realistic bleeding and smoke in VR-based surgical simulators, which outsources the computations to the graphical processing unit (GPU), thus freeing up the CPU for other time-critical tasks. This method is independent of the complexity of the organ models in the virtual environment. User studies were performed using 20 subjects to determine the visual quality of the simulations compared to real surgical videos. The smoke and bleeding simulation were implemented as part of a laparoscopic adjustable gastric banding (LAGB) simulator. For the bleeding simulation, the original implementation using the shader did not incur noticeable overhead. However, for smoke generation, an input/output (I/O) bottleneck was observed and two different methods were developed to overcome this limitation. Based on our benchmark results, a buffered approach performed better than a pipelined approach and could support up to 15 video streams in real time. Human subject studies showed that the visual realism of the simulations were as good as in real surgery (median rating of 4 on a 5-point Likert scale). Based on the performance results and subject study, both bleeding and smoke simulations were concluded to be efficient, highly realistic and well suited to VR-based surgical simulators. Copyright © 2010 John Wiley & Sons, Ltd.
Hardware Implementation of a Bilateral Subtraction Filter
NASA Technical Reports Server (NTRS)
Huertas, Andres; Watson, Robert; Villalpando, Carlos; Goldberg, Steven
2009-01-01
A bilateral subtraction filter has been implemented as a hardware module in the form of a field-programmable gate array (FPGA). In general, a bilateral subtraction filter is a key subsystem of a high-quality stereoscopic machine vision system that utilizes images that are large and/or dense. Bilateral subtraction filters have been implemented in software on general-purpose computers, but the processing speeds attainable in this way even on computers containing the fastest processors are insufficient for real-time applications. The present FPGA bilateral subtraction filter is intended to accelerate processing to real-time speed and to be a prototype of a link in a stereoscopic-machine- vision processing chain, now under development, that would process large and/or dense images in real time and would be implemented in an FPGA. In terms that are necessarily oversimplified for the sake of brevity, a bilateral subtraction filter is a smoothing, edge-preserving filter for suppressing low-frequency noise. The filter operation amounts to replacing the value for each pixel with a weighted average of the values of that pixel and the neighboring pixels in a predefined neighborhood or window (e.g., a 9 9 window). The filter weights depend partly on pixel values and partly on the window size. The present FPGA implementation of a bilateral subtraction filter utilizes a 9 9 window. This implementation was designed to take advantage of the ability to do many of the component computations in parallel pipelines to enable processing of image data at the rate at which they are generated. The filter can be considered to be divided into the following parts (see figure): a) An image pixel pipeline with a 9 9- pixel window generator, b) An array of processing elements; c) An adder tree; d) A smoothing-and-delaying unit; and e) A subtraction unit. After each 9 9 window is created, the affected pixel data are fed to the processing elements. Each processing element is fed the pixel value for its position in the window as well as the pixel value for the central pixel of the window. The absolute difference between these two pixel values is calculated and used as an address in a lookup table. Each processing element has a lookup table, unique for its position in the window, containing the weight coefficients for the Gaussian function for that position. The pixel value is multiplied by the weight, and the outputs of the processing element are the weight and pixel-value weight product. The products and weights are fed to the adder tree. The sum of the products and the sum of the weights are fed to the divider, which computes the sum of products the sum of weights. The output of the divider is denoted the bilateral smoothed image. The smoothing function is a simple weighted average computed over a 3 3 subwindow centered in the 9 9 window. After smoothing, the image is delayed by an additional amount of time needed to match the processing time for computing the bilateral smoothed image. The bilateral smoothed image is then subtracted from the 3 3 smoothed image to produce the final output. The prototype filter as implemented in a commercially available FPGA processes one pixel per clock cycle. Operation at a clock speed of 66 MHz has been demonstrated, and results of a static timing analysis have been interpreted as suggesting that the clock speed could be increased to as much as 100 MHz.
NASA Technical Reports Server (NTRS)
Belcastro, C. M.
1984-01-01
A methodology was developed a assess the upset susceptibility/reliability of a computer system onboard an aircraft flying through a lightning environment. Upset error modes in a general purpose microprocessor were studied. The upset tests involved the random input of analog transients which model lightning induced signals onto interface lines of an 8080 based microcomputer from which upset error data was recorded. The program code on the microprocessor during tests is designed to exercise all of the machine cycles and memory addressing techniques implemented in the 8080 central processing unit. A statistical analysis is presented in which possible correlations are established between the probability of upset occurrence and transient signal inputs during specific processing states and operations. A stochastic upset susceptibility model for the 8080 microprocessor is presented. The susceptibility of this microprocessor to upset, once analog transients have entered the system, is determined analytically by calculating the state probabilities of the stochastic model.
Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long
2016-04-01
Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evolution in a centralized transfusion service.
AuBuchon, James P; Linauts, Sandra; Vaughan, Mimi; Wagner, Jeffrey; Delaney, Meghan; Nester, Theresa
2011-12-01
The metropolitan Seattle area has utilized a centralized transfusion service model throughout the modern era of blood banking. This approach has used four laboratories to serve over 20 hospitals and clinics, providing greater capabilities for all at a lower consumption of resources than if each depended on its own laboratory and staff for these functions. In addition, this centralized model has facilitated wider use of the medical capabilities of the blood center's physicians, and a county-wide network of transfusion safety officers is now being developed to increase the impact of the blood center's transfusion expertise at the patient's bedside. Medical expectations and traffic have led the blood center to evolve the centralized model to include on-site laboratories at facilities with complex transfusion requirements (e.g., a children's hospital) and to implement in all the others a system of remote allocation. This new capability places a refrigerator stocked with uncrossmatched units in the hospital but retains control over the dispensing of these through the blood center's computer system; the correct unit can be electronically cross-matched and released on demand, obviating the need for transportation to the hospital and thus speeding transfusion. This centralized transfusion model has withstood the test of time and continues to evolve to meet new situations and ensure optimal patient care. © 2011 American Association of Blood Banks.
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2006-01-01
A computer program automatically builds large, full-resolution mosaics of multispectral images of Earth landmasses from images acquired by Landsat 7, complete with matching of colors and blending between adjacent scenes. While the code has been used extensively for Landsat, it could also be used for other data sources. A single mosaic of as many as 8,000 scenes, represented by more than 5 terabytes of data and the largest set produced in this work, demonstrated what the code could do to provide global coverage. The program first statistically analyzes input images to determine areas of coverage and data-value distributions. It then transforms the input images from their original universal transverse Mercator coordinates to other geographical coordinates, with scaling. It applies a first-order polynomial brightness correction to each band in each scene. It uses a data-mask image for selecting data and blending of input scenes. Under control by a user, the program can be made to operate on small parts of the output image space, with check-point and restart capabilities. The program runs on SGI IRIX computers. It is capable of parallel processing using shared-memory code, large memories, and tens of central processing units. It can retrieve input data and store output data at locations remote from the processors on which it is executed.
An efficient photogrammetric stereo matching method for high-resolution images
NASA Astrophysics Data System (ADS)
Li, Yingsong; Zheng, Shunyi; Wang, Xiaonan; Ma, Hao
2016-12-01
Stereo matching of high-resolution images is a great challenge in photogrammetry. The main difficulty is the enormous processing workload that involves substantial computing time and memory consumption. In recent years, the semi-global matching (SGM) method has been a promising approach for solving stereo problems in different data sets. However, the time complexity and memory demand of SGM are proportional to the scale of the images involved, which leads to very high consumption when dealing with large images. To solve it, this paper presents an efficient hierarchical matching strategy based on the SGM algorithm using single instruction multiple data instructions and structured parallelism in the central processing unit. The proposed method can significantly reduce the computational time and memory required for large scale stereo matching. The three-dimensional (3D) surface is reconstructed by triangulating and fusing redundant reconstruction information from multi-view matching results. Finally, three high-resolution aerial date sets are used to evaluate our improvement. Furthermore, precise airborne laser scanner data of one data set is used to measure the accuracy of our reconstruction. Experimental results demonstrate that our method remarkably outperforms in terms of time and memory savings while maintaining the density and precision of the 3D cloud points derived.
Distributed performance counters
Davis, Kristan D; Evans, Kahn C; Gara, Alan; Satterfield, David L
2013-11-26
A plurality of first performance counter modules is coupled to a plurality of processing cores. The plurality of first performance counter modules is operable to collect performance data associated with the plurality of processing cores respectively. A plurality of second performance counter modules are coupled to a plurality of L2 cache units, and the plurality of second performance counter modules are operable to collect performance data associated with the plurality of L2 cache units respectively. A central performance counter module may be operable to coordinate counter data from the plurality of first performance counter modules and the plurality of second performance modules, the a central performance counter module, the plurality of first performance counter modules, and the plurality of second performance counter modules connected by a daisy chain connection.
Quality Improvement Process in a Large Intensive Care Unit: Structure and Outcomes.
Reddy, Anita J; Guzman, Jorge A
2016-11-01
Quality improvement in the health care setting is a complex process, and even more so in the critical care environment. The development of intensive care unit process measures and quality improvement strategies are associated with improved outcomes, but should be individualized to each medical center as structure and culture can differ from institution to institution. The purpose of this report is to describe the structure of quality improvement processes within a large medical intensive care unit while using examples of the study institution's successes and challenges in the areas of stat antibiotic administration, reduction in blood product waste, central line-associated bloodstream infections, and medication errors. © The Author(s) 2015.
Synthetic Biology: Tools to Design, Build, and Optimize Cellular Processes
Young, Eric; Alper, Hal
2010-01-01
The general central dogma frames the emergent properties of life, which make biology both necessary and difficult to engineer. In a process engineering paradigm, each biological process stream and process unit is heavily influenced by regulatory interactions and interactions with the surrounding environment. Synthetic biology is developing the tools and methods that will increase control over these interactions, eventually resulting in an integrative synthetic biology that will allow ground-up cellular optimization. In this review, we attempt to contextualize the areas of synthetic biology into three tiers: (1) the process units and associated streams of the central dogma, (2) the intrinsic regulatory mechanisms, and (3) the extrinsic physical and chemical environment. Efforts at each of these three tiers attempt to control cellular systems and take advantage of emerging tools and approaches. Ultimately, it will be possible to integrate these approaches and realize the vision of integrative synthetic biology when cells are completely rewired for biotechnological goals. This review will highlight progress towards this goal as well as areas requiring further research. PMID:20150964
Buniatian, A A; Sablin, I N; Flerov, E V; Mierbekov, E M; Broĭtman, O G; Shevchenko, V V; Shitikov, I I
1995-01-01
Creation of computer monitoring systems (CMS) for operating rooms is one of the most important spheres of personal computer employment in anesthesiology. The authors developed a PC RS/AT-based CMS and effectively used it for more than 2 years. This system permits comprehensive monitoring in cardiosurgical operations by real time processing the values of arterial and central venous pressure, pressure in the pulmonary artery, bioelectrical activity of the brain, and two temperature values. Use of this CMS helped appreciably improve patients' safety during surgery. The possibility to assess brain function by computer monitoring the EEF simultaneously with central hemodynamics and body temperature permit the anesthesiologist to objectively assess the depth of anesthesia and to diagnose cerebral hypoxia. Automated anesthesiological chart issued by the CMS after surgery reliably reflects the patient's status and the measures taken by the anesthesiologist.
FPGA-accelerated algorithm for the regular expression matching system
NASA Astrophysics Data System (ADS)
Russek, P.; Wiatr, K.
2015-01-01
This article describes an algorithm to support a regular expressions matching system. The goal was to achieve an attractive performance system with low energy consumption. The basic idea of the algorithm comes from a concept of the Bloom filter. It starts from the extraction of static sub-strings for strings of regular expressions. The algorithm is devised to gain from its decomposition into parts which are intended to be executed by custom hardware and the central processing unit (CPU). The pipelined custom processor architecture is proposed and a software algorithm explained accordingly. The software part of the algorithm was coded in C and runs on a processor from the ARM family. The hardware architecture was described in VHDL and implemented in field programmable gate array (FPGA). The performance results and required resources of the above experiments are given. An example of target application for the presented solution is computer and network security systems. The idea was tested on nearly 100,000 body-based viruses from the ClamAV virus database. The solution is intended for the emerging technology of clusters of low-energy computing nodes.
Tool for Analysis and Reduction of Scientific Data
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
The Automated Scheduling and Planning Environment (ASPEN) computer program has been updated to version 3.0. ASPEN as a whole (up to version 2.0) has been summarized, and selected aspects of ASPEN have been discussed in several previous NASA Tech Briefs articles. Restated briefly, ASPEN is a modular, reconfigurable, application software framework for solving batch problems that involve reasoning about time, activities, states, and resources. Applications of ASPEN can include planning spacecraft missions, scheduling of personnel, and managing supply chains, inventories, and production lines. ASPEN 3.0 can be customized for a wide range of applications and for a variety of computing environments that include various central processing units and randomaccess memories. Domain-specific reasoning modules (e.g., modules for determining orbits for spacecraft) can easily be plugged into ASPEN 3.0. Improvements over other, similar software that have been incorporated into ASPEN 3.0 include a provision for more expressive time-line values, new parsing capabilities afforded by an ASPEN language based on Extensible Markup Language, improved search capabilities, and improved interfaces to other, utility-type software (notably including MATLAB).
ERIC Educational Resources Information Center
Morrison, Robert G.; Doumas, Leonidas A. A.; Richland, Lindsey E.
2011-01-01
Theories accounting for the development of analogical reasoning tend to emphasize either the centrality of relational knowledge accretion or changes in information processing capability. Simulations in LISA (Hummel & Holyoak, 1997, 2003), a neurally inspired computer model of analogical reasoning, allow us to explore how these factors may…
An overview of selected information storage and retrieval issues in computerized document processing
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Ihebuzor, Valentine U.
1984-01-01
The rapid development of computerized information storage and retrieval techniques has introduced the possibility of extending the word processing concept to document processing. A major advantage of computerized document processing is the relief of the tedious task of manual editing and composition usually encountered by traditional publishers through the immense speed and storage capacity of computers. Furthermore, computerized document processing provides an author with centralized control, the lack of which is a handicap of the traditional publishing operation. A survey of some computerized document processing techniques is presented with emphasis on related information storage and retrieval issues. String matching algorithms are considered central to document information storage and retrieval and are also discussed.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.
Han, Bing; Taha, Tarek M
2010-04-01
There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Cross stratum resources protection in fog-computing-based radio over fiber networks for 5G services
NASA Astrophysics Data System (ADS)
Guo, Shaoyong; Shao, Sujie; Wang, Yao; Yang, Hui
2017-09-01
In order to meet the requirement of internet of things (IoT) and 5G, the cloud radio access network is a paradigm which converges all base stations computational resources into a cloud baseband unit (BBU) pool, while the distributed radio frequency signals are collected by remote radio head (RRH). A precondition for centralized processing in the BBU pool is an interconnection fronthaul network with high capacity and low delay. However, it has become more complex and frequent in the interaction between RRH and BBU and resource scheduling among BBUs in cloud. Cloud radio over fiber network has been proposed in our previous work already. In order to overcome the complexity and latency, in this paper, we first present a novel cross stratum resources protection (CSRP) architecture in fog-computing-based radio over fiber networks (F-RoFN) for 5G services. Additionally, a cross stratum protection (CSP) scheme considering the network survivability is introduced in the proposed architecture. The CSRP with CSP scheme can effectively pull the remote processing resource locally to implement the cooperative radio resource management, enhance the responsiveness and resilience to the dynamic end-to-end 5G service demands, and globally optimize optical network, wireless and fog resources. The feasibility and efficiency of the proposed architecture with CSP scheme are verified on our software defined networking testbed in terms of service latency, transmission success rate, resource occupation rate and blocking probability.
NASA Astrophysics Data System (ADS)
Mallakpour, Iman; Villarini, Gabriele; Jones, Michael; Smith, James
2016-04-01
The central United States is a region of the country that has been plagued by frequent catastrophic flooding (e.g., flood events of 1993, 2008, 2013, and 2014), with large economic and social repercussions (e.g., fatalities, agricultural losses, flood losses, water quality issues). The goal of this study is to examine whether it is possible to describe the occurrence of flood events at the sub-seasonal scale in terms of variations in the climate system. Daily streamflow time series from 774 USGS stream gage stations over the central United States (defined here to include North Dakota, South Dakota, Nebraska, Kansas, Missouri, Iowa, Minnesota, Wisconsin, Illinois, West Virginia, Kentucky, Ohio, Indiana, and Michigan) with a record of at least 50 years and ending no earlier than 2011 are used for this study. We use a peak-over-threshold (POT) approach to identify flood peaks so that we have, on average two events per year. We model the occurrence/non-occurrence of a flood event over time using regression models based on Cox processes. Cox processes are widely used in biostatistics and can be viewed as a generalization of Poisson processes. Rather than assuming that flood events occur independently of the occurrence of previous events (as in Poisson processes), Cox processes allow us to account for the potential presence of temporal clustering, which manifests itself in an alternation of quiet and active periods. Here we model the occurrence/non-occurrence of flood events using two climate indices as climate time-varying covariates: the North Atlantic Oscillation (NAO) and the Pacific-North American pattern (PNA). The results of this study show that NAO and/or PNA can explain the temporal clustering in flood occurrences in over 90% of the stream gage stations we considered. Analyses of the sensitivity of the results to different average numbers of flood events per year (from one to five) are also performed and lead to the same conclusions. The findings of this work highlight that variations in the climate system play a critical role in explaining the occurrence of flood events at the sub-seasonal scale over the central United States.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Graham, Richard L; Ladd, Joshua S
This paper describes the design and implementation of InfiniBand (IB) CORE-Direct based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully ofFLoads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of CORE-Direct based hierarchical algorithm is better than production-grade Message Passing Interface (MPI) implementations, 150% better than the default Open MPI algorithm and 115% better than the shared memory optimized MVAPICH implementation for a one kilobyte (KB) message, and for eight mega-bytes (MB) it is 48% and 64% better, respectively. Flat-topology broadcast achieves 99.9% overlapmore » in a polling based communication-computation test, and 95.1% overlap for a wait based test, compared with 92.4% and 17.0%, respectively, for a similar Central Processing Unit (CPU) based implementation.« less
An electric propulsion long term test facility
NASA Technical Reports Server (NTRS)
Trump, G.; James, E.; Vetrone, R.; Bechtel, R.
1979-01-01
An existing test facility was modified to provide for extended testing of multiple electric propulsion thruster subsystems. A program to document thruster subsystem characteristics as a function of time is currently in progress. The facility is capable of simultaneously operating three 2.7-kW, 30-cm mercury ion thrusters and their power processing units. Each thruster is installed via a separate air lock so that it can be extended into the 7m x 10m main chamber without violating vacuum integrity. The thrusters exhaust into a 3m x 5m frozen mercury target. An array of cryopanels collect sputtered target material. Power processor units are tested in an adjacent 1.5m x 2m vacuum chamber or accompanying forced convection enclosure. The thruster subsystems and the test facility are designed for automatic unattended operation with thruster operation computer controlled. Test data are recorded by a central data collection system scanning 200 channels of data a second every two minutes. Results of the Systems Demonstration Test, a short shakedown test of 500 hours, and facility performance during the first year of testing are presented.
Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung
2012-10-08
Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.
Conceptual design of distillation-based hybrid separation processes.
Skiborowski, Mirko; Harwardt, Andreas; Marquardt, Wolfgang
2013-01-01
Hybrid separation processes combine different separation principles and constitute a promising design option for the separation of complex mixtures. Particularly, the integration of distillation with other unit operations can significantly improve the separation of close-boiling or azeotropic mixtures. Although the design of single-unit operations is well understood and supported by computational methods, the optimal design of flowsheets of hybrid separation processes is still a challenging task. The large number of operational and design degrees of freedom requires a systematic and optimization-based design approach. To this end, a structured approach, the so-called process synthesis framework, is proposed. This article reviews available computational methods for the conceptual design of distillation-based hybrid processes for the separation of liquid mixtures. Open problems are identified that must be addressed to finally establish a structured process synthesis framework for such processes.
Pham, Julius Cuong; Goeschel, Christine A; Berenholtz, Sean M; Demski, Renee; Lubomski, Lisa H; Rosen, Michael A; Sawyer, Melinda D; Thompson, David A; Trexler, Polly; Weaver, Sallie J; Weeks, Kristina R; Pronovost, Peter J
2016-01-01
A national collaborative helped many hospitals dramatically reduce central line-associated bloodstream infections (CLABSIs), but some hospitals struggled to reduce infection rates. This article describes the development of a peer-to-peer assessment process (CLABSI Conversations) and the practical, actionable practices we discovered that helped intensive care unit teams achieve a CLABSI rate of less than 1 infection per 1000 catheter-days for at least 1 year. CLABSI Conversations was designed as a learning-oriented process, in which a team of peers visited hospitals to surface barriers to infection prevention and to share best practices and insights from successful intensive care units. Common practices led to 10 recommendations: executive and board leaders communicate the goal of zero CLABSI throughout the hospital; senior and unit-level leaders hold themselves accountable for CLABSI rates; unit physicians and nurse leaders own the problem; clinical leaders and infection preventionists build infection prevention training and simulation programs; infection preventionists participate in unit-based CLABSI reduction efforts; hospital managers make compliance with best practices easy; clinical leaders standardize the hospital's catheter insertion and maintenance practices and empower nurses to stop any potentially harmful acts; unit leaders and infection preventionists investigate CLABSIs to identify root causes; and unit nurses and staff audit catheter maintenance policies and practices.
System and method for motor speed estimation of an electric motor
Lu, Bin [Kenosha, WI; Yan, Ting [Brookfield, WI; Luebke, Charles John [Sussex, WI; Sharma, Santosh Kumar [Viman Nagar, IN
2012-06-19
A system and method for a motor management system includes a computer readable storage medium and a processing unit. The processing unit configured to determine a voltage value of a voltage input to an alternating current (AC) motor, determine a frequency value of at least one of a voltage input and a current input to the AC motor, determine a load value from the AC motor, and access a set of motor nameplate data, where the set of motor nameplate data includes a rated power, a rated speed, a rated frequency, and a rated voltage of the AC motor. The processing unit is also configured to estimate a motor speed based on the voltage value, the frequency value, the load value, and the set of nameplate data and also store the motor speed on the computer readable storage medium.
DeCourcy, Kelly; Hostnik, Eric T; Lorbach, Josh; Knoblaugh, Sue
2016-12-01
An adult leopard gecko ( Eublepharis macularius ) presented for lethargy, hyporexia, weight loss, decreased passage of waste, and a palpable caudal coelomic mass. Computed tomography showed a heterogeneous hyperattenuating (∼143 Hounsfield units) structure within the right caudal coelom. The distal colon-coprodeum lumen or urinary bladder was hypothesized as the most likely location for the heterogeneous structure. Medical support consisted of warm water and lubricant enema, as well as a heated environment. Medical intervention aided the passage of a plug comprised centrally of cholesterol and urates with peripheral stratified layers of fibrin, macrophages, heterophils, and bacteria. Within 24 hr, a follow-up computed tomography scan showed resolution of the pelvic canal plug.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCormick, B.H.; Narasimhan, R.
1963-01-01
The overall computer system contains three main parts: an input device, a pattern recognition unit (PRU), and a control computer. The bubble chamber picture is divided into a grid of st run. Concent 1-mm squares on the film. It is then processed in parallel in a two-dimensional array of 1024 identical processing modules (stalactites) of the PRU. The array can function as a two- dimensional shift register in which results of successive shifting operations can be accumulated. The pattern recognition process is generally controlled by a conventional arithmetic computer. (A.G.W.)
Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying
2013-12-01
Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; McDougal, Matthew; Russell, Sam
2012-01-01
Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.
2010-01-01
Background The robust storage, updating and utilization of information are necessary for the maintenance and perpetuation of dynamic systems. These systems can exist as constructs of metal-oxide semiconductors and silicon, as in a digital computer, or in the "wetware" of organic compounds, proteins and nucleic acids that make up biological organisms. We propose that there are essential functional properties of centralized information-processing systems; for digital computers these properties reside in the computer's hard drive, and for eukaryotic cells they are manifest in the DNA and associated structures. Methods Presented herein is a descriptive framework that compares DNA and its associated proteins and sub-nuclear structure with the structure and function of the computer hard drive. We identify four essential properties of information for a centralized storage and processing system: (1) orthogonal uniqueness, (2) low level formatting, (3) high level formatting and (4) translation of stored to usable form. The corresponding aspects of the DNA complex and a computer hard drive are categorized using this classification. This is intended to demonstrate a functional equivalence between the components of the two systems, and thus the systems themselves. Results Both the DNA complex and the computer hard drive contain components that fulfill the essential properties of a centralized information storage and processing system. The functional equivalence of these components provides insight into both the design process of engineered systems and the evolved solutions addressing similar system requirements. However, there are points where the comparison breaks down, particularly when there are externally imposed information-organizing structures on the computer hard drive. A specific example of this is the imposition of the File Allocation Table (FAT) during high level formatting of the computer hard drive and the subsequent loading of an operating system (OS). Biological systems do not have an external source for a map of their stored information or for an operational instruction set; rather, they must contain an organizational template conserved within their intra-nuclear architecture that "manipulates" the laws of chemistry and physics into a highly robust instruction set. We propose that the epigenetic structure of the intra-nuclear environment and the non-coding RNA may play the roles of a Biological File Allocation Table (BFAT) and biological operating system (Bio-OS) in eukaryotic cells. Conclusions The comparison of functional and structural characteristics of the DNA complex and the computer hard drive leads to a new descriptive paradigm that identifies the DNA as a dynamic storage system of biological information. This system is embodied in an autonomous operating system that inductively follows organizational structures, data hierarchy and executable operations that are well understood in the computer science industry. Characterizing the "DNA hard drive" in this fashion can lead to insights arising from discrepancies in the descriptive framework, particularly with respect to positing the role of epigenetic processes in an information-processing context. Further expansions arising from this comparison include the view of cells as parallel computing machines and a new approach towards characterizing cellular control systems. PMID:20092652
D'Onofrio, David J; An, Gary
2010-01-21
The robust storage, updating and utilization of information are necessary for the maintenance and perpetuation of dynamic systems. These systems can exist as constructs of metal-oxide semiconductors and silicon, as in a digital computer, or in the "wetware" of organic compounds, proteins and nucleic acids that make up biological organisms. We propose that there are essential functional properties of centralized information-processing systems; for digital computers these properties reside in the computer's hard drive, and for eukaryotic cells they are manifest in the DNA and associated structures. Presented herein is a descriptive framework that compares DNA and its associated proteins and sub-nuclear structure with the structure and function of the computer hard drive. We identify four essential properties of information for a centralized storage and processing system: (1) orthogonal uniqueness, (2) low level formatting, (3) high level formatting and (4) translation of stored to usable form. The corresponding aspects of the DNA complex and a computer hard drive are categorized using this classification. This is intended to demonstrate a functional equivalence between the components of the two systems, and thus the systems themselves. Both the DNA complex and the computer hard drive contain components that fulfill the essential properties of a centralized information storage and processing system. The functional equivalence of these components provides insight into both the design process of engineered systems and the evolved solutions addressing similar system requirements. However, there are points where the comparison breaks down, particularly when there are externally imposed information-organizing structures on the computer hard drive. A specific example of this is the imposition of the File Allocation Table (FAT) during high level formatting of the computer hard drive and the subsequent loading of an operating system (OS). Biological systems do not have an external source for a map of their stored information or for an operational instruction set; rather, they must contain an organizational template conserved within their intra-nuclear architecture that "manipulates" the laws of chemistry and physics into a highly robust instruction set. We propose that the epigenetic structure of the intra-nuclear environment and the non-coding RNA may play the roles of a Biological File Allocation Table (BFAT) and biological operating system (Bio-OS) in eukaryotic cells. The comparison of functional and structural characteristics of the DNA complex and the computer hard drive leads to a new descriptive paradigm that identifies the DNA as a dynamic storage system of biological information. This system is embodied in an autonomous operating system that inductively follows organizational structures, data hierarchy and executable operations that are well understood in the computer science industry. Characterizing the "DNA hard drive" in this fashion can lead to insights arising from discrepancies in the descriptive framework, particularly with respect to positing the role of epigenetic processes in an information-processing context. Further expansions arising from this comparison include the view of cells as parallel computing machines and a new approach towards characterizing cellular control systems.
ERIC Educational Resources Information Center
Paul, Sandra K.; Kranberg, Susan
The third report from a comprehensive Unesco study, this document traces the history of the application of computer-based technology to the book distribution process in the United States and indicates functional areas currently showing the effects of using this technology. Ways in which computer use is altering book distribution management…
[Concept of an interdisciplinary emergency department at the Schwarzwald-Baar Hospital].
Kumle, B; Merz, S; Geiger, M; Kugel, K; Fink, U
2014-10-01
Numerous hospitals were combined years ago into a new Central Hospital for cost reasons in the Schwarzwald-Baar region. This also suggested the idea of a large central emergency department. The concept of a central emergency department is an organizational challenge, since they are directly engaged in the organizational structure of all medical departments that are involved in emergency treatment. Such a concept can only be enforced if it is supported by hospital management and all parties are willing to accept interdisciplinary and interprofessional work. In this paper, the concept of a central emergency department in a tertiary care hospital which was rebuilt as an organizationally independent unit is described. Collaborations with various departments, emergency services, and local physicians are highlighted. The processes of a central emergency department with an integrated admission department and personnel structures are described. The analysis of the concept after almost a year has shown that the integration into the clinic has been successful, the central emergency department has proven itself as a central hub and has been accepted as a unit within the hospital.
Parallel approach in RDF query processing
NASA Astrophysics Data System (ADS)
Vajgl, Marek; Parenica, Jan
2017-07-01
Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
ERIC Educational Resources Information Center
Mokhemar, Mary Ann
This kit for assessing central auditory processing disorders (CAPD), in children in grades 1 through 8 includes 3 books, 14 full-color cards with picture scenes, and a card depicting a phone key pad, all contained in a sturdy carrying case. The units in each of the three books correspond with auditory skill areas most commonly addressed in…
Computer hardware for radiologists: Part I
Indrajit, IK; Alam, A
2010-01-01
Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration. PMID:21042437
Acceleration of GPU-based Krylov solvers via data transfer reduction
Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...
2015-04-08
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
Teaching Psychology Students Computer Applications.
ERIC Educational Resources Information Center
Atnip, Gilbert W.
This paper describes an undergraduate-level course designed to teach the applications of computers that are most relevant in the social sciences, especially psychology. After an introduction to the basic concepts and terminology of computing, separate units were devoted to word processing, data analysis, data acquisition, artificial intelligence,…
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Leccese, Fabio; Cagnetti, Marco; Trinca, Daniele
2014-01-01
A smart city application has been realized and tested. It is a fully remote controlled isle of lamp posts based on new technologies. It has been designed and organized in different hierarchical layers, which perform local activities to physically control the lamp posts and transmit information with another for remote control. Locally, each lamp post uses an electronic card for management and a ZigBee tlc network transmits data to a central control unit, which manages the whole isle. The central unit is realized with a Raspberry-Pi control card due to its good computing performance at very low price. Finally, a WiMAX connection was tested and used to remotely control the smart grid, thus overcoming the distance limitations of commercial Wi-Fi networks. The isle has been realized and tested for some months in the field. PMID:25529206
NASA Technical Reports Server (NTRS)
Baumgardner, M. F. (Principal Investigator)
1974-01-01
The author has identified the following significant results. Multispectral scanner data obtained by ERTS-1 over six test sites in the Central United States were analyzed and interpreted. ERTS-1 data for some of the test sites were geometrically corrected and temporally overlayed. Computer-implemented pattern recognition techniques were used in the analysis of all multispectral data. These techniques were used to evaluate ERTS-1 data as a tool for soil survey. Geology maps and land use inventories were prepared by digital analysis of multispectral data. Identification and mapping of crop species and rangelands were achieved throught the analysis of 1972 and 1973 ERTS-1 data. Multiple dates of ERTS-1 data were examined to determine the variation with time of the areal extent of surface water resources on the Southern Great Plain.
Leccese, Fabio; Cagnetti, Marco; Trinca, Daniele
2014-12-18
A smart city application has been realized and tested. It is a fully remote controlled isle of lamp posts based on new technologies. It has been designed and organized in different hierarchical layers, which perform local activities to physically control the lamp posts and transmit information with another for remote control. Locally, each lamp post uses an electronic card for management and a ZigBee tlc network transmits data to a central control unit, which manages the whole isle. The central unit is realized with a Raspberry-Pi control card due to its good computing performance at very low price. Finally, a WiMAX connection was tested and used to remotely control the smart grid, thus overcoming the distance limitations of commercial Wi-Fi networks. The isle has been realized and tested for some months in the field.
NASA Astrophysics Data System (ADS)
Ohene-Kwofie, Daniel; Otoo, Ekow
2015-10-01
The ATLAS detector, operated at the Large Hadron Collider (LHC) records proton-proton collisions at CERN every 50ns resulting in a sustained data flow up to PB/s. The upgraded Tile Calorimeter of the ATLAS experiment will sustain about 5PB/s of digital throughput. These massive data rates require extremely fast data capture and processing. Although there has been a steady increase in the processing speed of CPU/GPGPU assembled for high performance computing, the rate of data input and output, even under parallel I/O, has not kept up with the general increase in computing speeds. The problem then is whether one can implement an I/O subsystem infrastructure capable of meeting the computational speeds of the advanced computing systems at the petascale and exascale level. We propose a system architecture that leverages the Partitioned Global Address Space (PGAS) model of computing to maintain an in-memory data-store for the Processing Unit (PU) of the upgraded electronics of the Tile Calorimeter which is proposed to be used as a high throughput general purpose co-processor to the sROD of the upgraded Tile Calorimeter. The physical memory of the PUs are aggregated into a large global logical address space using RDMA- capable interconnects such as PCI- Express to enhance data processing throughput.
Assessing pine regeneration for the South Central United States
William H. McWilliams
1990-01-01
Poor regeneration of pine following harvest on nonindustrial timberland has been identified as a major cause for loss of pine forests and slowdown of softwood growth in the Southern United States.Developing a strategy for regeneration assessment requires clear definition of sampling objectives, sampling design, and analytical processes. It is important that...
Igarashi, Jun; Shouno, Osamu; Fukai, Tomoki; Tsujino, Hiroshi
2011-11-01
Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic neurons. In order to address these problems, mainly the latter, we chose to use general purpose computing on graphics processing units (GPGPUs) for simulation of such a neural network, taking advantage of the powerful computational capability of a graphics processing unit (GPU). As a target for real-time simulation, we used a model of the basal ganglia that has been developed according to electrophysiological and anatomical knowledge. The model consists of heterogeneous populations of 370 spiking model neurons, including computationally heavy conductance-based models, connected by 11,002 synapses. Simulation of the model has not yet been performed in real-time using a general computing server. By parallelization of the model on the NVIDIA Geforce GTX 280 GPU in data-parallel and task-parallel fashion, faster-than-real-time simulation was robustly realized with only one-third of the GPU's total computational resources. Furthermore, we used the GPU's full computational resources to perform faster-than-real-time simulation of three instances of the basal ganglia model; these instances consisted of 1100 neurons and 33,006 synapses and were synchronized at each calculation step. Finally, we developed software for simultaneous visualization of faster-than-real-time simulation output. These results suggest the potential power of GPGPU techniques in real-time simulation of realistic neural networks. Copyright © 2011 Elsevier Ltd. All rights reserved.
Deana D. Pennington
2007-01-01
Exploratory modeling is an approach used when process and/or parameter uncertainties are such that modeling attempts at realistic prediction are not appropriate. Exploratory modeling makes use of computational experimentation to test how varying model scenarios drive model outcome. The goal of exploratory modeling is to better understand the system of interest through...
Treating Estimation and Mental Computation as Situated Mathematical Processes.
ERIC Educational Resources Information Center
Silver, Edward A.
This paper discusses the central thesis that new research on estimation and mental computation will benefit from more focused attention on the situations in which they are used. In the first section of the paper, a brief discussion of cognitive theory, with special attention to the emerging notion of situated cognition is presented. Three sources…
Organization of the secure distributed computing based on multi-agent system
NASA Astrophysics Data System (ADS)
Khovanskov, Sergey; Rumyantsev, Konstantin; Khovanskova, Vera
2018-04-01
Nowadays developing methods for distributed computing is received much attention. One of the methods of distributed computing is using of multi-agent systems. The organization of distributed computing based on the conventional network computers can experience security threats performed by computational processes. Authors have developed the unified agent algorithm of control system of computing network nodes operation. Network PCs is used as computing nodes. The proposed multi-agent control system for the implementation of distributed computing allows in a short time to organize using of the processing power of computers any existing network to solve large-task by creating a distributed computing. Agents based on a computer network can: configure a distributed computing system; to distribute the computational load among computers operated agents; perform optimization distributed computing system according to the computing power of computers on the network. The number of computers connected to the network can be increased by connecting computers to the new computer system, which leads to an increase in overall processing power. Adding multi-agent system in the central agent increases the security of distributed computing. This organization of the distributed computing system reduces the problem solving time and increase fault tolerance (vitality) of computing processes in a changing computing environment (dynamic change of the number of computers on the network). Developed a multi-agent system detects cases of falsification of the results of a distributed system, which may lead to wrong decisions. In addition, the system checks and corrects wrong results.
Transformation of personal computers and mobile phones into genetic diagnostic systems.
Walker, Faye M; Ahmad, Kareem M; Eisenstein, Michael; Soh, H Tom
2014-09-16
Molecular diagnostics based on the polymerase chain reaction (PCR) offer rapid and sensitive means for detecting infectious disease, but prohibitive costs have impeded their use in resource-limited settings where such diseases are endemic. In this work, we report an innovative method for transforming a desktop computer and a mobile camera phone--devices that have become readily accessible in developing countries--into a highly sensitive DNA detection system. This transformation was achieved by converting a desktop computer into a de facto thermal cycler with software that controls the temperature of the central processing unit (CPU), allowing for highly efficient PCR. Next, we reconfigured the mobile phone into a fluorescence imager by adding a low-cost filter, which enabled us to quantitatively measure the resulting PCR amplicons. Our system is highly sensitive, achieving quantitative detection of as little as 9.6 attograms of target DNA, and we show that its performance is comparable to advanced laboratory instruments at approximately 1/500th of the cost. Finally, in order to demonstrate clinical utility, we have used our platform for the successful detection of genomic DNA from the parasite that causes Chagas disease, Trypanosoma cruzi, directly in whole, unprocessed human blood at concentrations 4-fold below the clinical titer of the parasite.
Transformation of Personal Computers and Mobile Phones into Genetic Diagnostic Systems
2014-01-01
Molecular diagnostics based on the polymerase chain reaction (PCR) offer rapid and sensitive means for detecting infectious disease, but prohibitive costs have impeded their use in resource-limited settings where such diseases are endemic. In this work, we report an innovative method for transforming a desktop computer and a mobile camera phone—devices that have become readily accessible in developing countries—into a highly sensitive DNA detection system. This transformation was achieved by converting a desktop computer into a de facto thermal cycler with software that controls the temperature of the central processing unit (CPU), allowing for highly efficient PCR. Next, we reconfigured the mobile phone into a fluorescence imager by adding a low-cost filter, which enabled us to quantitatively measure the resulting PCR amplicons. Our system is highly sensitive, achieving quantitative detection of as little as 9.6 attograms of target DNA, and we show that its performance is comparable to advanced laboratory instruments at approximately 1/500th of the cost. Finally, in order to demonstrate clinical utility, we have used our platform for the successful detection of genomic DNA from the parasite that causes Chagas disease, Trypanosoma cruzi, directly in whole, unprocessed human blood at concentrations 4-fold below the clinical titer of the parasite. PMID:25223929
Robust Representation of Integrated Surface-subsurface Hydrology at Watershed Scales
NASA Astrophysics Data System (ADS)
Painter, S. L.; Tang, G.; Collier, N.; Jan, A.; Karra, S.
2015-12-01
A representation of integrated surface-subsurface hydrology is the central component to process-rich watershed models that are emerging as alternatives to traditional reduced complexity models. These physically based systems are important for assessing potential impacts of climate change and human activities on groundwater-dependent ecosystems and water supply and quality. Integrated surface-subsurface models typically couple three-dimensional solutions for variably saturated flow in the subsurface with the kinematic- or diffusion-wave equation for surface flows. The computational scheme for coupling the surface and subsurface systems is key to the robustness, computational performance, and ease-of-implementation of the integrated system. A new, robust approach for coupling the subsurface and surface systems is developed from the assumption that the vertical gradient in head is negligible at the surface. This tight-coupling assumption allows the surface flow system to be incorporated directly into the subsurface system; effects of surface flow and surface water accumulation are represented as modifications to the subsurface flow and accumulation terms but are not triggered until the subsurface pressure reaches a threshold value corresponding to the appearance of water on the surface. The new approach has been implemented in the highly parallel PFLOTRAN (www.pflotran.org) code. Several synthetic examples and three-dimensional examples from the Walker Branch Watershed in Oak Ridge TN demonstrate the utility and robustness of the new approach using unstructured computational meshes. Representation of solute transport in the new approach is also discussed. Notice: This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes.
Computer-Aided Process and Tools for Mobile Software Acquisition
2013-07-30
moldo^j= pmlkploba=obmloq=pbofbp= Computer-Aided Process and Tools for Mobile Software Acquisition 30 July 2013 LT Christopher Bonine , USN, Dr...Christopher Bonine is a lieutenant in the United States Navy. He is currently assigned to the Navy Cyber Defense Operations Command in Norfolk, VA. He has...interests are in development and implementation of cyber security policy. Bonine has a master’s in computer science from the Naval Postgraduate School
On Roles of Models in Information Systems
NASA Astrophysics Data System (ADS)
Sølvberg, Arne
The increasing penetration of computers into all aspects of human activity makes it desirable that the interplay among software, data and the domains where computers are applied is made more transparent. An approach to this end is to explicitly relate the modeling concepts of the domains, e.g., natural science, technology and business, to the modeling concepts of software and data. This may make it simpler to build comprehensible integrated models of the interactions between computers and non-computers, e.g., interaction among computers, people, physical processes, biological processes, and administrative processes. This chapter contains an analysis of various facets of the modeling environment for information systems engineering. The lack of satisfactory conceptual modeling tools seems to be central to the unsatisfactory state-of-the-art in establishing information systems. The chapter contains a proposal for defining a concept of information that is relevant to information systems engineering.
Transfer Entropy and Transient Limits of Computation
Prokopenko, Mikhail; Lizier, Joseph T.
2014-01-01
Transfer entropy is a recently introduced information-theoretic measure quantifying directed statistical coherence between spatiotemporal processes, and is widely used in diverse fields ranging from finance to neuroscience. However, its relationships to fundamental limits of computation, such as Landauer's limit, remain unknown. Here we show that in order to increase transfer entropy (predictability) by one bit, heat flow must match or exceed Landauer's limit. Importantly, we generalise Landauer's limit to bi-directional information dynamics for non-equilibrium processes, revealing that the limit applies to prediction, in addition to retrodiction (information erasure). Furthermore, the results are related to negentropy, and to Bremermann's limit and the Bekenstein bound, producing, perhaps surprisingly, lower bounds on the computational deceleration and information loss incurred during an increase in predictability about the process. The identified relationships set new computational limits in terms of fundamental physical quantities, and establish transfer entropy as a central measure connecting information theory, thermodynamics and theory of computation. PMID:24953547
NETL - Chemical Looping Reactor
None
2018-02-14
NETL's Chemical Looping Reactor unit is a high-temperature integrated CLC process with extensive instrumentation to improve computational simulations. A non-reacting test unit is also used to study solids flow at ambient temperature. The CLR unit circulates approximately 1,000 pounds per hour at temperatures around 1,800 degrees Fahrenheit.
19 CFR 10.592 - CBP processing procedures.
Code of Federal Regulations, 2010 CFR
2010-04-01
... Customs Duties U.S. CUSTOMS AND BORDER PROTECTION, DEPARTMENT OF HOMELAND SECURITY; DEPARTMENT OF THE TREASURY ARTICLES CONDITIONALLY FREE, SUBJECT TO A REDUCED RATE, ETC. Dominican Republic-Central America-United States Free Trade Agreement Post-Importation Duty Refund Claims § 10.592 CBP processing procedures...
NASA Astrophysics Data System (ADS)
Negrut, Dan; Lamb, David; Gorsich, David
2011-06-01
This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Army, and shall not be used for advertising or product endorsement purposes.
Computer hardware for radiologists: Part 2
Indrajit, IK; Alam, A
2010-01-01
Computers are an integral part of modern radiology equipment. In the first half of this two-part article, we dwelt upon some fundamental concepts regarding computer hardware, covering components like motherboard, central processing unit (CPU), chipset, random access memory (RAM), and memory modules. In this article, we describe the remaining computer hardware components that are of relevance to radiology. “Storage drive” is a term describing a “memory” hardware used to store data for later retrieval. Commonly used storage drives are hard drives, floppy drives, optical drives, flash drives, and network drives. The capacity of a hard drive is dependent on many factors, including the number of disk sides, number of tracks per side, number of sectors on each track, and the amount of data that can be stored in each sector. “Drive interfaces” connect hard drives and optical drives to a computer. The connections of such drives require both a power cable and a data cable. The four most popular “input/output devices” used commonly with computers are the printer, monitor, mouse, and keyboard. The “bus” is a built-in electronic signal pathway in the motherboard to permit efficient and uninterrupted data transfer. A motherboard can have several buses, including the system bus, the PCI express bus, the PCI bus, the AGP bus, and the (outdated) ISA bus. “Ports” are the location at which external devices are connected to a computer motherboard. All commonly used peripheral devices, such as printers, scanners, and portable drives, need ports. A working knowledge of computers is necessary for the radiologist if the workflow is to realize its full potential and, besides, this knowledge will prepare the radiologist for the coming innovations in the ‘ever increasing’ digital future. PMID:21423895
Computer hardware for radiologists: Part 2.
Indrajit, Ik; Alam, A
2010-11-01
Computers are an integral part of modern radiology equipment. In the first half of this two-part article, we dwelt upon some fundamental concepts regarding computer hardware, covering components like motherboard, central processing unit (CPU), chipset, random access memory (RAM), and memory modules. In this article, we describe the remaining computer hardware components that are of relevance to radiology. "Storage drive" is a term describing a "memory" hardware used to store data for later retrieval. Commonly used storage drives are hard drives, floppy drives, optical drives, flash drives, and network drives. The capacity of a hard drive is dependent on many factors, including the number of disk sides, number of tracks per side, number of sectors on each track, and the amount of data that can be stored in each sector. "Drive interfaces" connect hard drives and optical drives to a computer. The connections of such drives require both a power cable and a data cable. The four most popular "input/output devices" used commonly with computers are the printer, monitor, mouse, and keyboard. The "bus" is a built-in electronic signal pathway in the motherboard to permit efficient and uninterrupted data transfer. A motherboard can have several buses, including the system bus, the PCI express bus, the PCI bus, the AGP bus, and the (outdated) ISA bus. "Ports" are the location at which external devices are connected to a computer motherboard. All commonly used peripheral devices, such as printers, scanners, and portable drives, need ports. A working knowledge of computers is necessary for the radiologist if the workflow is to realize its full potential and, besides, this knowledge will prepare the radiologist for the coming innovations in the 'ever increasing' digital future.
Digital image processing using parallel computing based on CUDA technology
NASA Astrophysics Data System (ADS)
Skirnevskiy, I. P.; Pustovit, A. V.; Abdrashitova, M. O.
2017-01-01
This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feldman, S.C.; Taranik, J.V.
1986-05-01
Selected areas were mapped at a scale of 1:6000 in the southern hot Creek Range (south-central Nevada), which is underlain by Paleozoic autochthonous limestone, shale, and sandstone, Paleozoic allochthonous chert and siltstone, and Tertiary rhyolitic to dactitic ash flow tuff. The mapping was compared with computer-processed Airborne Imaging Spectrometer (AIS) data and Landsat Thematic Mapper (TM) imagery. The AIS imagery of the Hot Creek Range was acquired in 1984 by a NASA C-130 aircraft; it has a spatial resolution of 12 m, and swath width of 380 m. The sensor was developed by the Jet Propulsion Laboratory and is themore » first in a series of NASA imaging spectrometers. The AIS collects 128 spectral bands, having a bandwidth of approximately 9 nm, in the short-wave infrared between 1.2 and 2.4 ..mu..m. This part of the spectrum contains important narrow spectral absorption features for the carbonate ion, hydroxyl ion, and water of hydration. Using computer-processed AIS imagery, therefore, the authors can separate calcite from dolomite, and kaolinite from illite and montmorillonite as well as differentiate geologic units containing these minerals. On the AIS imagery, the Upper Mississippian Tripon Pass Limestone shows a distinctive calcite absorption feature at 2.34 ..mu..m; this feature is not as pronounced in Cambrian and Ordovician limestones. The dolomitized Nevada Formation exhibits the dolomite absorption feature at 2.32 ..mu..m. Clay mineral absorption features near 2.2 ..mu..m can be distinguished in altered volcanics. Mineralogic identification was confirmed with field and laboratory spectroradiometer measurements, thin-section examination, and x-ray analysis. AIS results and field mapping were also compared to computer-processed Landsat TM imagery, the highest spectral and spatial resolution worldwide data set currently available.« less
Central mechanisms for force and motion--towards computational synthesis of human movement.
Hemami, Hooshang; Dariush, Behzad
2012-12-01
Anatomical, physiological and experimental research on the human body can be supplemented by computational synthesis of the human body for all movement: routine daily activities, sports, dancing, and artistic and exploratory involvements. The synthesis requires thorough knowledge about all subsystems of the human body and their interactions, and allows for integration of known knowledge in working modules. It also affords confirmation and/or verification of scientific hypotheses about workings of the central nervous system (CNS). A simple step in this direction is explored here for controlling the forces of constraint. It requires co-activation of agonist-antagonist musculature. The desired trajectories of motion and the force of contact have to be provided by the CNS. The spinal control involves projection onto a muscular subset that induces the force of contact. The projection of force in the sensory motor cortex is implemented via a well-defined neural population unit, and is executed in the spinal cord by a standard integral controller requiring input from tendon organs. The sensory motor cortex structure is extended to the case for directing motion via two neural population units with vision input and spindle efferents. Digital computer simulations show the feasibility of the system. The formulation is modular and can be extended to multi-link limbs, robot and humanoid systems with many pairs of actuators or muscles. It can be expanded to include reticular activating structures and learning. Copyright © 2012 Elsevier Ltd. All rights reserved.
BIO-Plex Information System Concept
NASA Technical Reports Server (NTRS)
Jones, Harry; Boulanger, Richard; Arnold, James O. (Technical Monitor)
1999-01-01
This paper describes a suggested design for an integrated information system for the proposed BIO-Plex (Bioregenerative Planetary Life Support Systems Test Complex) at Johnson Space Center (JSC), including distributed control systems, central control, networks, database servers, personal computers and workstations, applications software, and external communications. The system will have an open commercial computing and networking, architecture. The network will provide automatic real-time transfer of information to database server computers which perform data collection and validation. This information system will support integrated, data sharing applications for everything, from system alarms to management summaries. Most existing complex process control systems have information gaps between the different real time subsystems, between these subsystems and central controller, between the central controller and system level planning and analysis application software, and between the system level applications and management overview reporting. An integrated information system is vitally necessary as the basis for the integration of planning, scheduling, modeling, monitoring, and control, which will allow improved monitoring and control based on timely, accurate and complete data. Data describing the system configuration and the real time processes can be collected, checked and reconciled, analyzed and stored in database servers that can be accessed by all applications. The required technology is available. The only opportunity to design a distributed, nonredundant, integrated system is before it is built. Retrofit is extremely difficult and costly.
Campos, Marcelino; Llorens, Carlos; Sempere, José M; Futami, Ricardo; Rodriguez, Irene; Carrasco, Purificación; Capilla, Rafael; Latorre, Amparo; Coque, Teresa M; Moya, Andres; Baquero, Fernando
2015-08-05
Antibiotic resistance is a major biomedical problem upon which public health systems demand solutions to construe the dynamics and epidemiological risk of resistant bacteria in anthropogenically-altered environments. The implementation of computable models with reciprocity within and between levels of biological organization (i.e. essential nesting) is central for studying antibiotic resistances. Antibiotic resistance is not just the result of antibiotic-driven selection but more properly the consequence of a complex hierarchy of processes shaping the ecology and evolution of the distinct subcellular, cellular and supra-cellular vehicles involved in the dissemination of resistance genes. Such a complex background motivated us to explore the P-system standards of membrane computing an innovative natural computing formalism that abstracts the notion of movement across membranes to simulate antibiotic resistance evolution processes across nested levels of micro- and macro-environmental organization in a given ecosystem. In this article, we introduce ARES (Antibiotic Resistance Evolution Simulator) a software device that simulates P-system model scenarios with five types of nested computing membranes oriented to emulate a hierarchy of eco-biological compartments, i.e. a) peripheral ecosystem; b) local environment; c) reservoir of supplies; d) animal host; and e) host's associated bacterial organisms (microbiome). Computational objects emulating molecular entities such as plasmids, antibiotic resistance genes, antimicrobials, and/or other substances can be introduced into this framework and may interact and evolve together with the membranes, according to a set of pre-established rules and specifications. ARES has been implemented as an online server and offers additional tools for storage and model editing and downstream analysis. The stochastic nature of the P-system model implemented in ARES explicitly links within and between host dynamics into a simulation, with feedback reciprocity among the different units of selection influenced by antibiotic exposure at various ecological levels. ARES offers the possibility of modeling predictive multilevel scenarios of antibiotic resistance evolution that can be interrogated, edited and re-simulated if necessary, with different parameters, until a correct model description of the process in the real world is convincingly approached. ARES can be accessed at http://gydb.org/ares.
Processing Polarity: How the Ungrammatical Intrudes on the Grammatical
ERIC Educational Resources Information Center
Vasishth, Shravan; Brussow, Sven; Lewis, Richard L.; Drenhaus, Heiner
2008-01-01
A central question in online human sentence comprehension is, "How are linguistic relations established between different parts of a sentence?" Previous work has shown that this dependency resolution process can be computationally expensive, but the underlying reasons for this are still unclear. This article argues that dependency…
Development and Evaluation of Sterographic Display for Lung Cancer Screening
2008-12-01
burden. Application of GPUs – With the evolution of commodity graphics processing units (GPUs) for accelerating games on personal computers, over the...units, which are designed for rendering computer games , are readily available and can be programmed to perform the kinds of real-time calculations...575-581, 1994. 12. Anderson CM, Saloner D, Tsuruda JS, Shapeero LG, Lee RE. "Artifacts in maximun-intensity-projection display of MR angiograms
Particle-In-Cell simulations of high pressure plasmas using graphics processing units
NASA Astrophysics Data System (ADS)
Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter
2009-10-01
Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code
NASA Astrophysics Data System (ADS)
Payne, J.; McCune, D.; Prater, R.
2010-11-01
NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
Cloud Computing for radiologists.
Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit
2012-07-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.
Cloud Computing for radiologists
Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit
2012-01-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future. PMID:23599560
Rapid Parallel Calculation of shell Element Based On GPU
NASA Astrophysics Data System (ADS)
Wanga, Jian Hua; Lia, Guang Yao; Lib, Sheng; Li, Guang Yao
2010-06-01
Long computing time bottlenecked the application of finite element. In this paper, an effective method to speed up the FEM calculation by using the existing modern graphic processing unit and programmable colored rendering tool was put forward, which devised the representation of unit information in accordance with the features of GPU, converted all the unit calculation into film rendering process, solved the simulation work of all the unit calculation of the internal force, and overcame the shortcomings of lowly parallel level appeared ever before when it run in a single computer. Studies shown that this method could improve efficiency and shorten calculating hours greatly. The results of emulation calculation about the elasticity problem of large number cells in the sheet metal proved that using the GPU parallel simulation calculation was faster than using the CPU's. It is useful and efficient to solve the project problems in this way.
ERIC Educational Resources Information Center
Jung, Sehoon
2017-01-01
One of the central questions in recent second language processing research is whether the types of parsing heuristics and linguistic resources adult L2 learners compute during online processing are qualitatively similar or different from those used by native speakers of the target language. While the current L2 processing literature provides…
NASA Technical Reports Server (NTRS)
1974-01-01
The specifications and functions of the Central Data Processing (CDPF) Facility which supports the Earth Observatory Satellite (EOS) are discussed. The CDPF will receive the EOS sensor data and spacecraft data through the Spaceflight Tracking and Data Network (STDN) and the Operations Control Center (OCC). The CDPF will process the data and produce high density digital tapes, computer compatible tapes, film and paper print images, and other data products. The specific aspects of data inputs and data processing are identified. A block diagram of the CDPF to show the data flow and interfaces of the subsystems is provided.
The research of laser marking control technology
NASA Astrophysics Data System (ADS)
Zhang, Qiue; Zhang, Rong
2009-08-01
In the area of Laser marking, the general control method is insert control card to computer's mother board, it can not support hot swap, it is difficult to assemble or it. Moreover, the one marking system must to equip one computer. In the system marking, the computer can not to do the other things except to transmit marking digital information. Otherwise it can affect marking precision. Based on traditional control methods existed some problems, introduced marking graphic editing and digital processing by the computer finish, high-speed digital signal processor (DSP) control marking the whole process. The laser marking controller is mainly contain DSP2812, digital memorizer, DAC (digital analog converting) transform unit circuit, USB interface control circuit, man-machine interface circuit, and other logic control circuit. Download the marking information which is processed by computer to U disk, DSP read the information by USB interface on time, then processing it, adopt the DSP inter timer control the marking time sequence, output the scanner control signal by D/A parts. Apply the technology can realize marking offline, thereby reduce the product cost, increase the product efficiency. The system have good effect in actual unit markings, the marking speed is more quickly than PCI control card to 20 percent. It has application value in practicality.
Urban land use monitoring from computer-implemented processing of airborne multispectral data
NASA Technical Reports Server (NTRS)
Todd, W. J.; Mausel, P. W.; Baumgardner, M. F.
1976-01-01
Machine processing techniques were applied to multispectral data obtained from airborne scanners at an elevation of 600 meters over central Indianapolis in August, 1972. Computer analysis of these spectral data indicate that roads (two types), roof tops (three types), dense grass (two types), sparse grass (two types), trees, bare soil, and water (two types) can be accurately identified. Using computers, it is possible to determine land uses from analysis of type, size, shape, and spatial associations of earth surface images identified from multispectral data. Land use data developed through machine processing techniques can be programmed to monitor land use changes, simulate land use conditions, and provide impact statistics that are required to analyze stresses placed on spatial systems.
NASA Technical Reports Server (NTRS)
Bryant, W. H.; Morrell, F. R.
1981-01-01
An experimental redundant strapdown inertial measurement unit (RSDIMU) is developed as a link to satisfy safety and reliability considerations in the integrated avionics concept. The unit includes four two degree-of-freedom tuned rotor gyros, and four accelerometers in a skewed and separable semioctahedral array. These sensors are coupled to four microprocessors which compensate sensor errors. These microprocessors are interfaced with two flight computers which process failure detection, isolation, redundancy management, and general flight control/navigation algorithms. Since the RSDIMU is a developmental unit, it is imperative that the flight computers provide special visibility and facility in algorithm modification.
Storage peak gas-turbine power unit
NASA Technical Reports Server (NTRS)
Tsinkotski, B.
1980-01-01
A storage gas-turbine power plant using a two-cylinder compressor with intermediate cooling is studied. On the basis of measured characteristics of a .25 Mw compressor computer calculations of the parameters of the loading process of a constant capacity storage unit (05.3 million cu m) were carried out. The required compressor power as a function of time with and without final cooling was computed. Parameters of maximum loading and discharging of the storage unit were calculated, and it was found that for the complete loading of a fully unloaded storage unit, a capacity of 1 to 1.5 million cubic meters is required, depending on the final cooling.
NASA Astrophysics Data System (ADS)
Svatos, Adam Ladislav
This thesis describes the author's contributions to three separate projects. The bus of the NORSAT-2 satellite was developed by the Space Flight Laboratory (SFL) for the Norwegian Space Centre (NSC) and Space Norway. The author's contributions to the mission were performing unit tests for the components of all the spacecraft subsystems as well as designing and assembling the flatsat from flight spares. Gedex's Vector Gravimeter for Asteroids (VEGA) is an accelerometer for spacecraft. The author's contributions to this payload were modifying the instrument computer board schematic, designing the printed circuit board, developing and applying test software, and performing thermal acceptance testing of two instrument computer boards. The SFL's cylindrical Hall effect thruster combines the cylindrical configuration for a Hall thruster and uses permanent magnets to achieve miniaturization and low power consumption, respectively. The author's contributions were to design, build, and test an engineering model power processing unit.
Mukumoto, Nobutaka; Tsujii, Katsutomo; Saito, Susumu; Yasunaga, Masayoshi; Takegawa, Hideki; Yamamoto, Tokihiro; Numasaki, Hodaka; Teshima, Teruki
2009-10-01
To develop an infrastructure for the integrated Monte Carlo verification system (MCVS) to verify the accuracy of conventional dose calculations, which often fail to accurately predict dose distributions, mainly due to inhomogeneities in the patient's anatomy, for example, in lung and bone. The MCVS consists of the graphical user interface (GUI) based on a computational environment for radiotherapy research (CERR) with MATLAB language. The MCVS GUI acts as an interface between the MCVS and a commercial treatment planning system to import the treatment plan, create MC input files, and analyze MC output dose files. The MCVS consists of the EGSnrc MC codes, which include EGSnrc/BEAMnrc to simulate the treatment head and EGSnrc/DOSXYZnrc to calculate the dose distributions in the patient/phantom. In order to improve computation time without approximations, an in-house cluster system was constructed. The phase-space data of a 6-MV photon beam from a Varian Clinac unit was developed and used to establish several benchmarks under homogeneous conditions. The MC results agreed with the ionization chamber measurements to within 1%. The MCVS GUI could import and display the radiotherapy treatment plan created by the MC method and various treatment planning systems, such as RTOG and DICOM-RT formats. Dose distributions could be analyzed by using dose profiles and dose volume histograms and compared on the same platform. With the cluster system, calculation time was improved in line with the increase in the number of central processing units (CPUs) at a computation efficiency of more than 98%. Development of the MCVS was successful for performing MC simulations and analyzing dose distributions.
A "Black-and-White Box" Approach to User Empowerment with Component Computing
ERIC Educational Resources Information Center
Kynigos, C.
2004-01-01
The paper discusses three aspects central to the 10 year-old process of design, development and use of E-slate, a construction kit for educational software. These are: (1) the design of computational media for user empowerment, (2) the socially-grounded approach to the building of user communities and (3) the issue of long-term sustainability as…
ERIC Educational Resources Information Center
Wiediger, Susan D.
2009-01-01
The periodic table and the periodic system are central to chemistry and thus to many introductory chemistry courses. A number of existing activities use various data sets to model the development process for the periodic table. This paper describes an image arrangement computer program developed to mimic a paper-based card sorting periodic table…
NASA Astrophysics Data System (ADS)
Varela Rodriguez, F.
2011-12-01
The control system of each of the four major Experiments at the CERN Large Hadron Collider (LHC) is distributed over up to 160 computers running either Linux or Microsoft Windows. A quick response to abnormal situations of the computer infrastructure is crucial to maximize the physics usage. For this reason, a tool was developed to supervise, identify errors and troubleshoot such a large system. Although the monitoring of the performance of the Linux computers and their processes was available since the first versions of the tool, it is only recently that the software package has been extended to provide similar functionality for the nodes running Microsoft Windows as this platform is the most commonly used in the LHC detector control systems. In this paper, the architecture and the functionality of the Windows Management Instrumentation (WMI) client developed to provide centralized monitoring of the nodes running different flavour of the Microsoft platform, as well as the interface to the SCADA software of the control systems are presented. The tool is currently being commissioned by the Experiments and it has already proven to be very efficient optimize the running systems and to detect misbehaving processes or nodes.
Mendel-GPU: haplotyping and genotype imputation on graphics processing units
Chen, Gary K.; Wang, Kai; Stram, Alex H.; Sobel, Eric M.; Lange, Kenneth
2012-01-01
Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22954633
Time and number of displays impact critical signal detection in fetal heart rate tracings.
Anderson, Brittany L; Scerbo, Mark W; Belfore, Lee A; Abuhamad, Alfred Z
2011-06-01
Interest in centralized monitoring in labor and delivery units is growing because it affords the opportunity to monitor multiple patients simultaneously. However, a long history of research on sustained attention reveals these types of monitoring tasks can be problematic. The goal of the present experiment was to examine the ability of individuals to detect critical signals in fetal heart rate (FHR) tracings in one or more displays over an extended period of time. Seventy-two participants monitored one, two, or four computer-simulated FHR tracings on a computer display for the appearance of late decelerations over a 48-minute vigil. Measures of subjective stress and workload were also obtained before and after the vigil. The results showed that detection accuracy decreased over time and also declined as the number of displays increased. The subjective reports indicated that participants found the task to be stressful and mentally demanding, effortful, and frustrating. The results suggest that centralized monitoring that allows many patients to be monitored simultaneously may impose a detrimental attentional burden on the observer. Furthermore, this seemingly benign task may impose an additional source of stress and mental workload above what is commonly found in labor and delivery units. © Thieme Medical Publishers.
The revolution in data gathering systems
NASA Technical Reports Server (NTRS)
Cambra, J. M.; Trover, W. F.
1975-01-01
Data acquisition systems used in NASA's wind tunnels from the 1950's through the present time are summarized as a baseline for assessing the impact of minicomputers and microcomputers on data acquisition and data processing. Emphasis is placed on the cyclic evolution in computer technology which transformed the central computer system, and finally the distributed computer system. Other developments discussed include: medium scale integration, large scale integration, combining the functions of data acquisition and control, and micro and minicomputers.
Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning
Ichnowski, Jeffrey; Prins, Jan F.; Alterovitz, Ron
2014-01-01
We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU’s cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot’s configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot. PMID:25419474
Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning.
Ichnowski, Jeffrey; Prins, Jan F; Alterovitz, Ron
2014-05-01
We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU's cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot's configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot.
Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat.
Aasebø, Ida E J; Lepperød, Mikkel E; Stavrinou, Maria; Nøkkevangen, Sandra; Einevoll, Gaute; Hafting, Torkel; Fyhn, Marianne
2017-01-01
The activity pattern and temporal dynamics within and between neuron ensembles are essential features of information processing and believed to be profoundly affected by anesthesia. Much of our general understanding of sensory information processing, including computational models aimed at mathematically simulating sensory information processing, rely on parameters derived from recordings conducted on animals under anesthesia. Due to the high variety of neuronal subtypes in the brain, population-based estimates of the impact of anesthesia may conceal unit- or ensemble-specific effects of the transition between states. Using chronically implanted tetrodes into primary visual cortex (V1) of rats, we conducted extracellular recordings of single units and followed the same cell ensembles in the awake and anesthetized states. We found that the transition from wakefulness to anesthesia involves unpredictable changes in temporal response characteristics. The latency of single-unit responses to visual stimulation was delayed in anesthesia, with large individual variations between units. Pair-wise correlations between units increased under anesthesia, indicating more synchronized activity. Further, the units within an ensemble show reproducible temporal activity patterns in response to visual stimuli that is changed between states, suggesting state-dependent sequences of activity. The current dataset, with recordings from the same neural ensembles across states, is well suited for validating and testing computational network models. This can lead to testable predictions, bring a deeper understanding of the experimental findings and improve models of neural information processing. Here, we exemplify such a workflow using a Brunel network model.
Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat
Aasebø, Ida E. J.; Stavrinou, Maria; Nøkkevangen, Sandra; Einevoll, Gaute
2017-01-01
Abstract The activity pattern and temporal dynamics within and between neuron ensembles are essential features of information processing and believed to be profoundly affected by anesthesia. Much of our general understanding of sensory information processing, including computational models aimed at mathematically simulating sensory information processing, rely on parameters derived from recordings conducted on animals under anesthesia. Due to the high variety of neuronal subtypes in the brain, population-based estimates of the impact of anesthesia may conceal unit- or ensemble-specific effects of the transition between states. Using chronically implanted tetrodes into primary visual cortex (V1) of rats, we conducted extracellular recordings of single units and followed the same cell ensembles in the awake and anesthetized states. We found that the transition from wakefulness to anesthesia involves unpredictable changes in temporal response characteristics. The latency of single-unit responses to visual stimulation was delayed in anesthesia, with large individual variations between units. Pair-wise correlations between units increased under anesthesia, indicating more synchronized activity. Further, the units within an ensemble show reproducible temporal activity patterns in response to visual stimuli that is changed between states, suggesting state-dependent sequences of activity. The current dataset, with recordings from the same neural ensembles across states, is well suited for validating and testing computational network models. This can lead to testable predictions, bring a deeper understanding of the experimental findings and improve models of neural information processing. Here, we exemplify such a workflow using a Brunel network model. PMID:28791331
Biomorphic Multi-Agent Architecture for Persistent Computing
NASA Technical Reports Server (NTRS)
Lodding, Kenneth N.; Brewster, Paul
2009-01-01
A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.
Code of Federal Regulations, 2010 CFR
2010-07-01
... good engineering judgement and standards, such as ANSI B31-3. In food/medical service means that a... manufacture a Food and Drug Administration regulated product where leakage of a barrier fluid into the process..., storage at the chemical manufacturing process unit to which the records pertain, or storage in central...
Zur, Hadas; Tuller, Tamir
2016-01-01
mRNA translation is the fundamental process of decoding the information encoded in mRNA molecules by the ribosome for the synthesis of proteins. The centrality of this process in various biomedical disciplines such as cell biology, evolution and biotechnology, encouraged the development of dozens of mathematical and computational models of translation in recent years. These models aimed at capturing various biophysical aspects of the process. The objective of this review is to survey these models, focusing on those based and/or validated on real large-scale genomic data. We consider aspects such as the complexity of the models, the biophysical aspects they regard and the predictions they may provide. Furthermore, we survey the central systems biology discoveries reported on their basis. This review demonstrates the fundamental advantages of employing computational biophysical translation models in general, and discusses the relative advantages of the different approaches and the challenges in the field. PMID:27591251
Centralized Multi-Sensor Square Root Cubature Joint Probabilistic Data Association
Liu, Jun; Li, Gang; Qi, Lin; Li, Yaowen; He, You
2017-01-01
This paper focuses on the tracking problem of multiple targets with multiple sensors in a nonlinear cluttered environment. To avoid Jacobian matrix computation and scaling parameter adjustment, improve numerical stability, and acquire more accurate estimated results for centralized nonlinear tracking, a novel centralized multi-sensor square root cubature joint probabilistic data association algorithm (CMSCJPDA) is proposed. Firstly, the multi-sensor tracking problem is decomposed into several single-sensor multi-target tracking problems, which are sequentially processed during the estimation. Then, in each sensor, the assignment of its measurements to target tracks is accomplished on the basis of joint probabilistic data association (JPDA), and a weighted probability fusion method with square root version of a cubature Kalman filter (SRCKF) is utilized to estimate the targets’ state. With the measurements in all sensors processed CMSCJPDA is derived and the global estimated state is achieved. Experimental results show that CMSCJPDA is superior to the state-of-the-art algorithms in the aspects of tracking accuracy, numerical stability, and computational cost, which provides a new idea to solve multi-sensor tracking problems. PMID:29113085
Centralized Multi-Sensor Square Root Cubature Joint Probabilistic Data Association.
Liu, Yu; Liu, Jun; Li, Gang; Qi, Lin; Li, Yaowen; He, You
2017-11-05
This paper focuses on the tracking problem of multiple targets with multiple sensors in a nonlinear cluttered environment. To avoid Jacobian matrix computation and scaling parameter adjustment, improve numerical stability, and acquire more accurate estimated results for centralized nonlinear tracking, a novel centralized multi-sensor square root cubature joint probabilistic data association algorithm (CMSCJPDA) is proposed. Firstly, the multi-sensor tracking problem is decomposed into several single-sensor multi-target tracking problems, which are sequentially processed during the estimation. Then, in each sensor, the assignment of its measurements to target tracks is accomplished on the basis of joint probabilistic data association (JPDA), and a weighted probability fusion method with square root version of a cubature Kalman filter (SRCKF) is utilized to estimate the targets' state. With the measurements in all sensors processed CMSCJPDA is derived and the global estimated state is achieved. Experimental results show that CMSCJPDA is superior to the state-of-the-art algorithms in the aspects of tracking accuracy, numerical stability, and computational cost, which provides a new idea to solve multi-sensor tracking problems.
Beyond the Computer: Reading as a Process of Intellectual Development.
ERIC Educational Resources Information Center
Thompson, Mark E.
With more than 100,000 computers in public schools across the United States, the impact of computer assisted instruction (CAI) on students' reading behavior needs to be evaluated. In reading laboratories, CAI has been found to provide an efficient and highly motivating means of teaching specific educational objectives. Yet, while computer…
Computer simulation of the NASA water vapor electrolysis reactor
NASA Technical Reports Server (NTRS)
Bloom, A. M.
1974-01-01
The water vapor electrolysis (WVE) reactor is a spacecraft waste reclamation system for extended-mission manned spacecraft. The WVE reactor's raw material is water, its product oxygen. A computer simulation of the WVE operational processes provided the data required for an optimal design of the WVE unit. The simulation process was implemented with the aid of a FORTRAN IV routine.
Flooding and Atmospheric Rivers across the Western United States
NASA Astrophysics Data System (ADS)
Villarini, G.; Barth, N. A.; White, K. D.
2017-12-01
Flood frequency analysis across the western United States is complicated by annual peak flow records that frequently contain flows generated from distinctly different flood generating mechanisms. Among the different flood agents, atmospheric rivers (ARs) are responsible for large, regional scale floods. USGS streamgaging stations in the central Columbia River Basin in the Pacific Northwest, the Sierra Nevada, the central and southern California coast, and central Arizona show a mixture of 30-70% AR-generated flood peaks among the complete period of record. Bulletin17B and its proposed update (Draft Bulletin 17C) continue to recognize difficulties in determining flood frequency estimates among streamflow records that contain flood peaks coming from different flood-generating mechanisms, as is the case in the western United States. They recommend developing separate frequency curves when the hydrometeorologic mechanisms that generated the annual peak flows can be separated into distinct subpopulations. Yet challenges arise when trying to consistently quantify the physical (hydrometeorologic) processes that generated the observed flows, and even more when trying to account for them in flood frequency estimation. This study provides a general statistical framework to perform a process-driven flood frequency analysis using a weighted mixed population approach, highlighting the role that ARs play on the flood peak distribution.
Model of a programmable quantum processing unit based on a quantum transistor effect
NASA Astrophysics Data System (ADS)
Ablayev, Farid; Andrianov, Sergey; Fetisov, Danila; Moiseev, Sergey; Terentyev, Alexandr; Urmanchev, Andrey; Vasiliev, Alexander
2018-02-01
In this paper we propose a model of a programmable quantum processing device realizable with existing nano-photonic technologies. It can be viewed as a basis for new high performance hardware architectures. Protocols for physical implementation of device on the controlled photon transfer and atomic transitions are presented. These protocols are designed for executing basic single-qubit and multi-qubit gates forming a universal set. We analyze the possible operation of this quantum computer scheme. Then we formalize the physical architecture by a mathematical model of a Quantum Processing Unit (QPU), which we use as a basis for the Quantum Programming Framework. This framework makes it possible to perform universal quantum computations in a multitasking environment.
NASA Astrophysics Data System (ADS)
Doyle, Paul; Mtenzi, Fred; Smith, Niall; Collins, Adrian; O'Shea, Brendan
2012-09-01
The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.
Programming the Navier-Stokes computer: An abstract machine model and a visual editor
NASA Technical Reports Server (NTRS)
Middleton, David; Crockett, Tom; Tomboulian, Sherry
1988-01-01
The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs.
ERIC Educational Resources Information Center
Ghasem, Nayef
2016-01-01
This paper illustrates a teaching technique used in computer applications in chemical engineering employed for designing various unit operation processes, where the students learn about unit operations by designing them. The aim of the course is not to teach design, but rather to teach the fundamentals and the function of unit operation processes…
21 CFR 1401.10 - Fees to be charged-general.
Code of Federal Regulations, 2011 CFR
2011-04-01
... search. (b) Computerized search for records. ONDCP will charge 116% of the salary of the programmer/operator and the apportionable time of the central processing unit directly attributed to the search. (c...
21 CFR 1401.10 - Fees to be charged-general.
Code of Federal Regulations, 2010 CFR
2010-04-01
... search. (b) Computerized search for records. ONDCP will charge 116% of the salary of the programmer/operator and the apportionable time of the central processing unit directly attributed to the search. (c...
A Distributed Processing Approach to Payroll Time Reporting for a Large School District.
ERIC Educational Resources Information Center
Freeman, Raoul J.
1983-01-01
Describes a system for payroll reporting from geographically disparate locations in which data is entered, edited, and verified locally on minicomputers and then uploaded to a central computer for the standard payroll process. Communications and hardware, time-reporting software, data input techniques, system implementation, and its advantages are…
NASA Technical Reports Server (NTRS)
Dundas, T. R.
1981-01-01
The development and capabilities of the Montana geodata system are discussed. The system is entirely dependent on the state's central data processing facility which serves all agencies and is therefore restricted to batch mode processing. The computer graphics equipment is briefly described along with its application to state lands and township mapping and the production of water quality interval maps.
16. VIEW OF THE ENRICHED URANIUM RECOVERY SYSTEM. ENRICHED URANIUM ...
16. VIEW OF THE ENRICHED URANIUM RECOVERY SYSTEM. ENRICHED URANIUM RECOVERY PROCESSED RELATIVELY PURE MATERIALS AND SOLUTIONS AND SOLID RESIDUES WITH RELATIVELY LOW URANIUM CONTENT. URANIUM RECOVERY INVOLVED BOTH SLOW AND FAST PROCESSES. (4/4/66) - Rocky Flats Plant, General Manufacturing, Support, Records-Central Computing, Southern portion of Plant, Golden, Jefferson County, CO
Monitoring and Depth of Strategy Use in Computer-Based Learning Environments for Science and History
ERIC Educational Resources Information Center
Deekens, Victor M.; Greene, Jeffrey A.; Lobczowski, Nikki G.
2018-01-01
Background: Self-regulated learning (SRL) models position metacognitive monitoring as central to SRL processing and predictive of student learning outcomes (Winne & Hadwin, 2008; Zimmerman, 2000). A body of research evidence also indicates that depth of strategy use, ranging from surface to deep processing, is predictive of learning…
ERIC Educational Resources Information Center
Natale, Samuel M., Ed.; Fenton, Mark B., Ed.
This volume contains 21 papers that explore value conflicts in all professions: "Ethics and the Development of Work: The Central Maintenance Computer Case" (Harry Hummels); "Too Many Cooks Spoil the Stew--Ethical Preparation of Interdisciplinary Professionals" (Vincent F. Maher); "Value Conflict in 'Competence-Based' Training Incentives" (John…
Porting a Hall MHD Code to a Graphic Processing Unit
NASA Technical Reports Server (NTRS)
Dorelli, John C.
2011-01-01
We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.