high-speed parallel interface: Topics by Science.gov

Sample records for high-speed parallel interface

Development of gallium arsenide high-speed, low-power serial parallel interface modules: Executive summary

NASA Technical Reports Server (NTRS)

1988-01-01

Final report to NASA LeRC on the development of gallium arsenide (GaAS) high-speed, low power serial/parallel interface modules. The report discusses the development and test of a family of 16, 32 and 64 bit parallel to serial and serial to parallel integrated circuits using a self aligned gate MESFET technology developed at the Honeywell Sensors and Signal Processing Laboratory. Lab testing demonstrated 1.3 GHz clock rates at a power of 300 mW. This work was accomplished under contract number NAS3-24676.
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, Wallace B.; DuBois, David H.

1996-01-01

A system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway.
The crew activity planning system bus interface unit

NASA Technical Reports Server (NTRS)

Allen, M. A.

1979-01-01

The hardware and software designs used to implement a high speed parallel communications interface to the MITRE 307.2 kilobit/second serial bus communications system are described. The primary topic is the development of the bus interface unit.
Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

DOE PAGES

Schulz, Martin; Galarowicz, Jim; Maghrak, Don; ...

2008-01-01

Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering themore » most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.« less
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, W.B.; DuBois, D.H.

1996-12-03

Disclosed is a system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway. 7 figs.
User Interface Developed for Controls/CFD Interdisciplinary Research

NASA Technical Reports Server (NTRS)

1996-01-01

The NASA Lewis Research Center, in conjunction with the University of Akron, is developing analytical methods and software tools to create a cross-discipline "bridge" between controls and computational fluid dynamics (CFD) technologies. Traditionally, the controls analyst has used simulations based on large lumping techniques to generate low-order linear models convenient for designing propulsion system controls. For complex, high-speed vehicles such as the High Speed Civil Transport (HSCT), simulations based on CFD methods are required to capture the relevant flow physics. The use of CFD should also help reduce the development time and costs associated with experimentally tuning the control system. The initial application for this research is the High Speed Civil Transport inlet control problem. A major aspect of this research is the development of a controls/CFD interface for non-CFD experts, to facilitate the interactive operation of CFD simulations and the extraction of reduced-order, time-accurate models from CFD results. A distributed computing approach for implementing the interface is being explored. Software being developed as part of the Integrated CFD and Experiments (ICE) project provides the basis for the operating environment, including run-time displays and information (data base) management. Message-passing software is used to communicate between the ICE system and the CFD simulation, which can reside on distributed, parallel computing systems. Initially, the one-dimensional Large-Perturbation Inlet (LAPIN) code is being used to simulate a High Speed Civil Transport type inlet. LAPIN can model real supersonic inlet features, including bleeds, bypasses, and variable geometry, such as translating or variable-ramp-angle centerbodies. Work is in progress to use parallel versions of the multidimensional NPARC code.
Embedded controller for GEM detector readout system

NASA Astrophysics Data System (ADS)

Zabołotny, Wojciech M.; Byszuk, Adrian; Chernyshova, Maryna; Cieszewski, Radosław; Czarski, Tomasz; Dominik, Wojciech; Jakubowska, Katarzyna L.; Kasprowicz, Grzegorz; Poźniak, Krzysztof; Rzadkiewicz, Jacek; Scholz, Marek

2013-10-01

This paper describes the embedded controller used for the multichannel readout system for the GEM detector. The controller is based on the embedded Mini ITX mainboard, running the GNU/Linux operating system. The controller offers two interfaces to communicate with the FPGA based readout system. FPGA configuration and diagnostics is controlled via low speed USB based interface, while high-speed setup of the readout parameters and reception of the measured data is handled by the PCI Express (PCIe) interface. Hardware access is synchronized by the dedicated server written in C. Multiple clients may connect to this server via TCP/IP network, and different priority is assigned to individual clients. Specialized protocols have been implemented both for low level access on register level and for high level access with transfer of structured data with "msgpack" protocol. High level functionalities have been split between multiple TCP/IP servers for parallel operation. Status of the system may be checked, and basic maintenance may be performed via web interface, while the expert access is possible via SSH server. System was designed with reliability and flexibility in mind.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Integrated test system of infrared and laser data based on USB 3.0

NASA Astrophysics Data System (ADS)

Fu, Hui Quan; Tang, Lin Bo; Zhang, Chao; Zhao, Bao Jun; Li, Mao Wen

2017-07-01

Based on USB3.0, this paper presents the design method of an integrated test system for both infrared image data and laser signal data processing module. The core of the design is FPGA logic control, the design uses dual-chip DDR3 SDRAM to achieve high-speed laser data cache, and receive parallel LVDS image data through serial-to-parallel conversion chip, and it achieves high-speed data communication between the system and host computer through the USB3.0 bus. The experimental results show that the developed PC software realizes the real-time display of 14-bit LVDS original image after 14-to-8 bit conversion and JPEG2000 compressed image after decompression in software, and can realize the real-time display of the acquired laser signal data. The correctness of the test system design is verified, indicating that the interface link is normal.
A high speed buffer for LV data acquisition

NASA Technical Reports Server (NTRS)

Cavone, Angelo A.; Sterlina, Patrick S.; Clemmons, James I., Jr.; Meyers, James F.

1987-01-01

The laser velocimeter (autocovariance) buffer interface is a data acquisition subsystem designed specifically for the acquisition of data from a laser velocimeter. The subsystem acquires data from up to six laser velocimeter components in parallel, measures the times between successive data points for each of the components, establishes and maintains a coincident condition between any two or three components, and acquires data from other instrumentation systems simultaneously with the laser velocimeter data points. The subsystem is designed to control the entire data acquisition process based on initial setup parameters obtained from a host computer and to be independent of the computer during the acquisition. On completion of the acquisition cycle, the interface transfers the contents of its memory to the host under direction of the host via a single 16-bit parallel DMA channel.
Processing Device for High-Speed Execution of an Xrisc Computer Program

NASA Technical Reports Server (NTRS)

Ng, Tak-Kwong (Inventor); Mills, Carl S. (Inventor)

2016-01-01

A processing device for high-speed execution of a computer program is provided. A memory module may store one or more computer programs. A sequencer may select one of the computer programs and controls execution of the selected program. A register module may store intermediate values associated with a current calculation set, a set of output values associated with a previous calculation set, and a set of input values associated with a subsequent calculation set. An external interface may receive the set of input values from a computing device and provides the set of output values to the computing device. A computation interface may provide a set of operands for computation during processing of the current calculation set. The set of input values are loaded into the register and the set of output values are unloaded from the register in parallel with processing of the current calculation set.
Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
High-rate serial interconnections for embedded and distributed systems with power and resource constraints

NASA Astrophysics Data System (ADS)

Sheynin, Yuriy; Shutenko, Felix; Suvorova, Elena; Yablokov, Evgenej

2008-04-01

High rate interconnections are important subsystems in modern data processing and control systems of many classes. They are especially important in prospective embedded and on-board systems that used to be multicomponent systems with parallel or distributed architecture, [1]. Modular architecture systems of previous generations were based on parallel busses that were widely used and standardised: VME, PCI, CompactPCI, etc. Busses evolution went in improvement of bus protocol efficiency (burst transactions, split transactions, etc.) and increasing operation frequencies. However, due to multi-drop bus nature and multi-wire skew problems the parallel bussing speedup became more and more limited. For embedded and on-board systems additional reason for this trend was in weight, size and power constraints of an interconnection and its components. Parallel interfaces have become technologically more challenging as their respective clock frequencies have increased to keep pace with the bandwidth requirements of their attached storage devices. Since each interface uses a data clock to gate and validate the parallel data (which is normally 8 bits or 16 bits wide), the clock frequency need only be equivalent to the byte rate or word rate being transmitted. In other words, for a given transmission frequency, the wider the data bus, the slower the clock. As the clock frequency increases, more high frequency energy is available in each of the data lines, and a portion of this energy is dissipated in radiation. Each data line not only transmits this energy but also receives some from its neighbours. This form of mutual interference is commonly called "cross-talk," and the signal distortion it produces can become another major contributor to loss of data integrity unless compensated by appropriate cable designs. Other transmission problems such as frequency-dependent attenuation and signal reflections, while also applicable to serial interfaces, are more troublesome in parallel interfaces due to the number of additional cable conductors involved. In order to compensate for these drawbacks, higher quality cables, shorter cable runs and fewer devices on the bus have been the norm. Finally, the physical bulk of the parallel cables makes them more difficult to route inside an enclosure, hinders cooling airflow and is incompatible with the trend toward smaller form-factor devices. Parallel busses worked in systems during the past 20 years, but the accumulated problems dictate the need for change and the technology is available to spur the transition. The general trend in high-rate interconnections turned from parallel bussing to scalable interconnections with a network architecture and high-rate point-to-point links. Analysis showed that data links with serial information transfer could achieve higher throughput and efficiency and it was confirmed in various research and practical design. Serial interfaces offer an improvement over older parallel interfaces: better performance, better scalability, and also better reliability as the parallel interfaces are at their limits of speed with reliable data transfers and others. The trend was implemented in major standards' families evolution: e.g. from PCI/PCI-X parallel bussing to PCIExpress interconnection architecture with serial lines, from CompactPCI parallel bus to ATCA (Advanced Telecommunications Architecture) specification with serial links and network topologies of an interconnection, etc. In the article we consider a general set of characteristics and features of serial interconnections, give a brief overview of serial interconnections specifications. In more details we present the SpaceWire interconnection technology. Have been developed for space on-board systems applications the SpaceWire has important features and characteristics that make it a prospective interconnection for wide range of embedded systems.
Integration experiences and performance studies of A COTS parallel archive systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Bary

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Gary

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less
Interface fluctuations during rapid drainage

NASA Astrophysics Data System (ADS)

Ayaz, Monem; Toussaint, Renaud; Schäfer, Gerhard; Jørgen Måløy, Knut; Moura, Marcel

2017-04-01

We experimentally study the interface dynamics of an immiscible fluid as it invades a monolayer of saturated porous medium through rapid drainage. The seemingly stable and continuous motion of the interface at macroscale, involves numerous abrupt pore-scale jumps and local reconfigurations of the interface. By computing the velocity fluctuations along the invasion front from sequences of images captured at high frame rate, we are able to study both the local and global behavior. The latter displays an intermittent behavior with power-law distributed avalanches in size and duration. As the system is drained potential surface energy is stored at the interface up to a given threshold in pressure. The energy released generates elastic waves at the confining plate, which we detect using piezoelectric type acoustic sensors. By detecting pore-scale events emanating from the depinning of the interface, we look to develop techniques for localizing the displacement front. To assess the quality of these techniques, optical monitoring is done in parallel using a high speed camera.
Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

NASA Technical Reports Server (NTRS)

Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Airborne Precision Spacing for Dependent Parallel Operations Interface Study

NASA Technical Reports Server (NTRS)

Volk, Paul M.; Takallu, M. A.; Hoffler, Keith D.; Weiser, Jarold; Turner, Dexter

2012-01-01

This paper describes a usability study of proposed cockpit interfaces to support Airborne Precision Spacing (APS) operations for aircraft performing dependent parallel approaches (DPA). NASA has proposed an airborne system called Pair Dependent Speed (PDS) which uses their Airborne Spacing for Terminal Arrival Routes (ASTAR) algorithm to manage spacing intervals. Interface elements were designed to facilitate the input of APS-DPA spacing parameters to ASTAR, and to convey PDS system information to the crew deemed necessary and/or helpful to conduct the operation, including: target speed, guidance mode, target aircraft depiction, and spacing trend indication. In the study, subject pilots observed recorded simulations using the proposed interface elements in which the ownship managed assigned spacing intervals from two other arriving aircraft. Simulations were recorded using the Aircraft Simulation for Traffic Operations Research (ASTOR) platform, a medium-fidelity simulator based on a modern Boeing commercial glass cockpit. Various combinations of the interface elements were presented to subject pilots, and feedback was collected via structured questionnaires. The results of subject pilot evaluations show that the proposed design elements were acceptable, and that preferable combinations exist within this set of elements. The results also point to potential improvements to be considered for implementation in future experiments.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Using Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

NASA Astrophysics Data System (ADS)

O'Connor, A. S.; Justice, B.; Harris, A. T.

2013-12-01

Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
A CMOS high speed imaging system design based on FPGA

NASA Astrophysics Data System (ADS)

Tang, Hong; Wang, Huawei; Cao, Jianzhong; Qiao, Mingrui

2015-10-01

CMOS sensors have more advantages than traditional CCD sensors. The imaging system based on CMOS has become a hot spot in research and development. In order to achieve the real-time data acquisition and high-speed transmission, we design a high-speed CMOS imaging system on account of FPGA. The core control chip of this system is XC6SL75T and we take advantages of CameraLink interface and AM41V4 CMOS image sensors to transmit and acquire image data. AM41V4 is a 4 Megapixel High speed 500 frames per second CMOS image sensor with global shutter and 4/3" optical format. The sensor uses column parallel A/D converters to digitize the images. The CameraLink interface adopts DS90CR287 and it can convert 28 bits of LVCMOS/LVTTL data into four LVDS data stream. The reflected light of objects is photographed by the CMOS detectors. CMOS sensors convert the light to electronic signals and then send them to FPGA. FPGA processes data it received and transmits them to upper computer which has acquisition cards through CameraLink interface configured as full models. Then PC will store, visualize and process images later. The structure and principle of the system are both explained in this paper and this paper introduces the hardware and software design of the system. FPGA introduces the driven clock of CMOS. The data in CMOS is converted to LVDS signals and then transmitted to the data acquisition cards. After simulation, the paper presents a row transfer timing sequence of CMOS. The system realized real-time image acquisition and external controls.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

PubMed

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Smart photodetector arrays for error control in page-oriented optical memory

NASA Astrophysics Data System (ADS)

Schaffer, Maureen Elizabeth

1998-12-01

Page-oriented optical memories (POMs) have been proposed to meet high speed, high capacity storage requirements for input/output intensive computer applications. This technology offers the capability for storage and retrieval of optical data in two-dimensional pages resulting in high throughput data rates. Since currently measured raw bit error rates for these systems fall several orders of magnitude short of industry requirements for binary data storage, powerful error control codes must be adopted. These codes must be designed to take advantage of the two-dimensional memory output. In addition, POMs require an optoelectronic interface to transfer the optical data pages to one or more electronic host systems. Conventional charge coupled device (CCD) arrays can receive optical data in parallel, but the relatively slow serial electronic output of these devices creates a system bottleneck thereby eliminating the POM advantage of high transfer rates. Also, CCD arrays are "unintelligent" interfaces in that they offer little data processing capabilities. The optical data page can be received by two-dimensional arrays of "smart" photo-detector elements that replace conventional CCD arrays. These smart photodetector arrays (SPAs) can perform fast parallel data decoding and error control, thereby providing an efficient optoelectronic interface between the memory and the electronic computer. This approach optimizes the computer memory system by combining the massive parallelism and high speed of optics with the diverse functionality, low cost, and local interconnection efficiency of electronics. In this dissertation we examine the design of smart photodetector arrays for use as the optoelectronic interface for page-oriented optical memory. We review options and technologies for SPA fabrication, develop SPA requirements, and determine SPA scalability constraints with respect to pixel complexity, electrical power dissipation, and optical power limits. Next, we examine data modulation and error correction coding for the purpose of error control in the POM system. These techniques are adapted, where possible, for 2D data and evaluated as to their suitability for a SPA implementation in terms of BER, code rate, decoder time and pixel complexity. Our analysis shows that differential data modulation combined with relatively simple block codes known as array codes provide a powerful means to achieve the desired data transfer rates while reducing error rates to industry requirements. Finally, we demonstrate the first smart photodetector array designed to perform parallel error correction on an entire page of data and satisfy the sustained data rates of page-oriented optical memories. Our implementation integrates a monolithic PN photodiode array and differential input receiver for optoelectronic signal conversion with a cluster error correction code using 0.35-mum CMOS. This approach provides high sensitivity, low electrical power dissipation, and fast parallel correction of 2 x 2-bit cluster errors in an 8 x 8 bit code block to achieve corrected output data rates scalable to 102 Gbps in the current technology increasing to 1.88 Tbps in 0.1-mum CMOS.
HPC in Basin Modeling: Simulating Mechanical Compaction through Vertical Effective Stress using Level Sets

NASA Astrophysics Data System (ADS)

McGovern, S.; Kollet, S. J.; Buerger, C. M.; Schwede, R. L.; Podlaha, O. G.

2017-12-01

In the context of sedimentary basins, we present a model for the simulation of the movement of ageological formation (layers) during the evolution of the basin through sedimentation and compactionprocesses. Assuming a single phase saturated porous medium for the sedimentary layers, the modelfocuses on the tracking of the layer interfaces, through the use of the level set method, as sedimentationdrives fluid-flow and reduction of pore space by compaction. On the assumption of Terzaghi's effectivestress concept, the coupling of the pore fluid pressure to the motion of interfaces in 1-D is presented inMcGovern, et.al (2017) [1] .The current work extends the spatial domain to 3-D, though we maintain the assumption ofvertical effective stress to drive the compaction. The idealized geological evolution is conceptualized asthe motion of interfaces between rock layers, whose paths are determined by the magnitude of a speedfunction in the direction normal to the evolving layer interface. The speeds normal to the interface aredependent on the change in porosity, determined through an effective stress-based compaction law,such as the exponential Athy's law. Provided with the speeds normal to the interface, the level setmethod uses an advection equation to evolve a potential function, whose zero level set defines theinterface. Thus, the moving layer geometry influences the pore pressure distribution which couplesback to the interface speeds. The flexible construction of the speed function allows extension, in thefuture, to other terms to represent different physical processes, analogous to how the compaction rulerepresents material deformation.The 3-D model is implemented using the generic finite element method framework Deal II,which provides tools, building on p4est and interfacing to PETSc, for the massively parallel distributedsolution to the model equations [2]. Experiments are being run on the Juelich Supercomputing Center'sJureca cluster. [1] McGovern, et.al. (2017). Novel basin modelling concept for simulating deformation from mechanical compaction using level sets. Computational Geosciences, SI:ECMOR XV, 1-14.[2] Bangerth, et. al. (2011). Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Transactions on Mathematical Software (TOMS), 38(2):14.
Control of a small working robot on a large flexible manipulator for suppressing vibrations

NASA Technical Reports Server (NTRS)

Lee, Soo Han

1991-01-01

The short term objective of this research is the completion of experimental configuration of the Small Articulated Robot (SAM) and the derivations of the actuator dynamics of the Robotic Arm, Large and Flexible (RALF). In order to control vibrations SAM should have larger bandwidth than that of the vibrations. The bandwidth of SAM consist of 3 parts; structural rigidity, processing speed of controller, and motor speed. The structural rigidity was increased to a reasonably high value by attaching aluminum angles at weak points and replacing thin side plates by thicker ones. The high processing speed of the controller was achieved by using parallel processors (three 68000 process, three interface board, and one main processor (IBM-XT)). Maximum joint speed and acceleration of SAM is known as about 4 rad/s and 15 rad/sq s. Hence SAM can move only .04 rad at 3 Hz which is the natural frequency of RALF. This will be checked by experiment.
Transmission and reflection of strongly nonlinear solitary waves at granular interfaces.

PubMed

Tichler, A M; Gómez, L R; Upadhyaya, N; Campman, X; Nesterenko, V F; Vitelli, V

2013-07-26

The interaction of a solitary wave with an interface formed by two strongly nonlinear noncohesive granular lattices displays rich behavior, characterized by the breakdown of continuum equations of motion in the vicinity of the interface. By treating the solitary wave as a quasiparticle with an effective mass, we construct an intuitive (energy- and linear-momentum-conserving) discrete model to predict the amplitudes of the transmitted solitary waves generated when an incident solitary-wave front, parallel to the interface, moves from a denser to a lighter granular hexagonal lattice. Our findings are corroborated with simulations. We then successfully extend this model to oblique interfaces, where we find that the angle of refraction and reflection of a solitary wave follows, below a critical value, an analogue of Snell's law in which the solitary-wave speed replaces the speed of sound, which is zero in the sonic vacuum.
Transmission and Reflection of Strongly Nonlinear Solitary Waves at Granular Interfaces

NASA Astrophysics Data System (ADS)

Tichler, A. M.; Gómez, L. R.; Upadhyaya, N.; Campman, X.; Nesterenko, V. F.; Vitelli, V.

2013-07-01

The interaction of a solitary wave with an interface formed by two strongly nonlinear noncohesive granular lattices displays rich behavior, characterized by the breakdown of continuum equations of motion in the vicinity of the interface. By treating the solitary wave as a quasiparticle with an effective mass, we construct an intuitive (energy- and linear-momentum-conserving) discrete model to predict the amplitudes of the transmitted solitary waves generated when an incident solitary-wave front, parallel to the interface, moves from a denser to a lighter granular hexagonal lattice. Our findings are corroborated with simulations. We then successfully extend this model to oblique interfaces, where we find that the angle of refraction and reflection of a solitary wave follows, below a critical value, an analogue of Snell’s law in which the solitary-wave speed replaces the speed of sound, which is zero in the sonic vacuum.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Parallel image registration with a thin client interface

NASA Astrophysics Data System (ADS)

Saiprasad, Ganesh; Lo, Yi-Jung; Plishker, William; Lei, Peng; Ahmad, Tabassum; Shekhar, Raj

2010-03-01

Despite its high significance, the clinical utilization of image registration remains limited because of its lengthy execution time and a lack of easy access. The focus of this work was twofold. First, we accelerated our course-to-fine, volume subdivision-based image registration algorithm by a novel parallel implementation that maintains the accuracy of our uniprocessor implementation. Second, we developed a thin-client computing model with a user-friendly interface to perform rigid and nonrigid image registration. Our novel parallel computing model uses the message passing interface model on a 32-core cluster. The results show that, compared with the uniprocessor implementation, the parallel implementation of our image registration algorithm is approximately 5 times faster for rigid image registration and approximately 9 times faster for nonrigid registration for the images used. To test the viability of such systems for clinical use, we developed a thin client in the form of a plug-in in OsiriX, a well-known open source PACS workstation and DICOM viewer, and used it for two applications. The first application registered the baseline and follow-up MR brain images, whose subtraction was used to track progression of multiple sclerosis. The second application registered pretreatment PET and intratreatment CT of radiofrequency ablation patients to demonstrate a new capability of multimodality imaging guidance. The registration acceleration coupled with the remote implementation using a thin client should ultimately increase accuracy, speed, and access of image registration-based interpretations in a number of diagnostic and interventional applications.
Oasis: A high-level/high-performance open source Navier-Stokes solver

NASA Astrophysics Data System (ADS)

Mortensen, Mikael; Valen-Sendstad, Kristian

2015-03-01

Oasis is a high-level/high-performance finite element Navier-Stokes solver written from scratch in Python using building blocks from the FEniCS project (fenicsproject.org). The solver is unstructured and targets large-scale applications in complex geometries on massively parallel clusters. Oasis utilizes MPI and interfaces, through FEniCS, to the linear algebra backend PETSc. Oasis advocates a high-level, programmable user interface through the creation of highly flexible Python modules for new problems. Through the high-level Python interface the user is placed in complete control of every aspect of the solver. A version of the solver, that is using piecewise linear elements for both velocity and pressure, is shown to reproduce very well the classical, spectral, turbulent channel simulations of Moser et al. (1999). The computational speed is strongly dominated by the iterative solvers provided by the linear algebra backend, which is arguably the best performance any similar implicit solver using PETSc may hope for. Higher order accuracy is also demonstrated and new solvers may be easily added within the same framework.
A parallel architecture of interpolated timing recovery for high- speed data transfer rate and wide capture-range

NASA Astrophysics Data System (ADS)

Higashino, Satoru; Kobayashi, Shoei; Yamagami, Tamotsu

2007-06-01

High data transfer rate has been demanded for data storage devices along increasing the storage capacity. In order to increase the transfer rate, high-speed data processing techniques in read-channel devices are required. Generally, parallel architecture is utilized for the high-speed digital processing. We have developed a new architecture of Interpolated Timing Recovery (ITR) to achieve high-speed data transfer rate and wide capture-range in read-channel devices for the information storage channels. It facilitates the parallel implementation on large-scale-integration (LSI) devices.
Thermal Shock Damage and Microstructure Evolution of Thermal Barrier Coatings on Mar-M247 Superalloy in a Combustion Gas Environment

NASA Astrophysics Data System (ADS)

Mei, Hui

2012-06-01

The effect of preoxidation on the thermal shock of air plasma sprayed thermal barrier coatings (TBCs) was completely investigated in a combustion gas environment by burning jet fuel with high speed air. Results show that with increasing cycles, the as-oxidized TBCs lost more weight and enlarged larger spallation area than the as-sprayed ones. Thermally grown oxide (TGO) growth and thermal mismatch stress were proven to play critical roles on the as-oxidized TBC failure. Two types of significant cracks were identified: the type I crack was vertical to the TGO interface and the type II crack was parallel to the TGO interface. The former accelerated the TGO growth to develop the latter as long as the oxidizing gas continuously diffused inward and then oxidized the more bond coat (BC). The preoxidation treatment directly increased the TGO thickness, formed the parallel cracks earlier in the TGO during the thermal shocks, and eventually resulted in the worse thermal shock resistance.
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
High Speed Surface Thermocouples Interface to Wireless Transmitters

DTIC Science & Technology

2017-03-15

Government and/or Private Sector Use Being able to measure high-speed surface temperatures in hostile environments where wireless transmission of the data...09/16/2016 See Item 16 Draft Reg Repro 16. REMARKS Eric Gingrich, COR I Item 0: High Speed Surface Thermocouples Interface to Wireless ...Speed Surface Thermocouples Interface to Wireless Transmitters W56HZV-16-C-0149 Sb. GRANT NUMBER Sc. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Sd. PROJECT
Ethernet-based test stand for a CAN network

NASA Astrophysics Data System (ADS)

Ziebinski, Adam; Cupek, Rafal; Drewniak, Marek

2017-11-01

This paper presents a test stand for the CAN-based systems that are used in automotive systems. The authors propose applying an Ethernet-based test system that supports the virtualisation of a CAN network. The proposed solution has many advantages compared to classical test beds that are based on dedicated CAN-PC interfaces: it allows the physical constraints associated with the number of interfaces that can be simultaneously connected to a tested system to be avoided, which enables the test time for parallel tests to be shortened; the high speed of Ethernet transmission allows for more frequent sampling of the messages that are transmitted by a CAN network (as the authors show in the experiment results section) and the cost of the proposed solution is much lower than the traditional lab-based dedicated CAN interfaces for PCs.
Parallel arms races between garter snakes and newts involving tetrodotoxin as the phenotypic interface of coevolution.

PubMed

Brodie, Edmund D; Feldman, Chris R; Hanifin, Charles T; Motychak, Jeffrey E; Mulcahy, Daniel G; Williams, Becky L; Brodie, Edmund D

2005-02-01

Parallel "arms races" involving the same or similar phenotypic interfaces allow inference about selective forces driving coevolution, as well as the importance of phylogenetic and phenotypic constraints in coevolution. Here, we report the existence of apparent parallel arms races between species pairs of garter snakes and their toxic newt prey that indicate independent evolutionary origins of a key phenotype in the interface. In at least one area of sympatry, the aquatic garter snake, Thamnophis couchii, has evolved elevated resistance to the neurotoxin tetrodotoxin (TTX), present in the newt Taricha torosa. Previous studies have shown that a distantly related garter snake, Thamnophis sirtalis, has coevolved with another newt species that possesses TTX, Taricha granulosa. Patterns of within population variation and phenotypic tradeoffs between TTX resistance and sprint speed suggest that the mechanism of resistance is similar in both species of snake, yet phylogenetic evidence indicates the independent origins of elevated resistance to TTX.
Applications of Emerging Parallel Optical Link Technology to High Energy Physics Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chramowicz, J.; Kwan, S.; Prosser, A.

2011-09-01

Modern particle detectors depend upon optical fiber links to deliver event data to upstream trigger and data processing systems. Future detector systems can benefit from the development of dense arrangements of high speed optical links emerging from the telecommunications and storage area network market segments. These links support data transfers in each direction at rates up to 120 Gbps in packages that minimize or even eliminate edge connector requirements. Emerging products include a class of devices known as optical engines which permit assembly of the optical transceivers in close proximity to the electrical interfaces of ASICs and FPGAs which handlemore » the data in parallel electrical format. Such assemblies will reduce required printed circuit board area and minimize electromagnetic interference and susceptibility. We will present test results of some of these parallel components and report on the development of pluggable FPGA Mezzanine Cards equipped with optical engines to provide to collaborators on the Versatile Link Common Project for the HI-LHC at CERN.« less

The Software Correlator of the Chinese VLBI Network

NASA Technical Reports Server (NTRS)

Zheng, Weimin; Quan, Ying; Shu, Fengchun; Chen, Zhong; Chen, Shanshan; Wang, Weihua; Wang, Guangli

2010-01-01

The software correlator of the Chinese VLBI Network (CVN) has played an irreplaceable role in the CVN routine data processing, e.g., in the Chinese lunar exploration project. This correlator will be upgraded to process geodetic and astronomical observation data. In the future, with several new stations joining the network, CVN will carry out crustal movement observations, quick UT1 measurements, astrophysical observations, and deep space exploration activities. For the geodetic or astronomical observations, we need a wide-band 10-station correlator. For spacecraft tracking, a realtime and highly reliable correlator is essential. To meet the scientific and navigation requirements of CVN, two parallel software correlators in the multiprocessor environments are under development. A high speed, 10-station prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm on a computer cluster platform is being developed. Another real-time software correlator for spacecraft tracking adopts the thread-parallel technology, and it runs on the SMP (Symmetric Multiple Processor) servers. Both correlators have the characteristic of flexible structure and scalability.
Design and Performance of a 1 ms High-Speed Vision Chip with 3D-Stacked 140 GOPS Column-Parallel PEs †.

PubMed

Nose, Atsushi; Yamazaki, Tomohiro; Katayama, Hironobu; Uehara, Shuji; Kobayashi, Masatsugu; Shida, Sayaka; Odahara, Masaki; Takamiya, Kenichi; Matsumoto, Shizunori; Miyashita, Leo; Watanabe, Yoshihiro; Izawa, Takashi; Muramatsu, Yoshinori; Nitta, Yoshikazu; Ishikawa, Masatoshi

2018-04-24

We have developed a high-speed vision chip using 3D stacking technology to address the increasing demand for high-speed vision chips in diverse applications. The chip comprises a 1/3.2-inch, 1.27 Mpixel, 500 fps (0.31 Mpixel, 1000 fps, 2 × 2 binning) vision chip with 3D-stacked column-parallel Analog-to-Digital Converters (ADCs) and 140 Giga Operation per Second (GOPS) programmable Single Instruction Multiple Data (SIMD) column-parallel PEs for new sensing applications. The 3D-stacked structure and column parallel processing architecture achieve high sensitivity, high resolution, and high-accuracy object positioning.
Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

PubMed

Wilson, J Adam; Williams, Justin C

2009-01-01

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Design and implementation of interface units for high speed fiber optics local area networks and broadband integrated services digital networks

NASA Technical Reports Server (NTRS)

Tobagi, Fouad A.; Dalgic, Ismail; Pang, Joseph

1990-01-01

The design and implementation of interface units for high speed Fiber Optic Local Area Networks and Broadband Integrated Services Digital Networks are discussed. During the last years, a number of network adapters that are designed to support high speed communications have emerged. This approach to the design of a high speed network interface unit was to implement package processing functions in hardware, using VLSI technology. The VLSI hardware implementation of a buffer management unit, which is required in such architectures, is described.
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system

NASA Astrophysics Data System (ADS)

Duran, Ahmet; Tuncel, Mehmet

2016-10-01

It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
The Xpress Transfer Protocol (XTP): A tutorial (expanded version)

NASA Technical Reports Server (NTRS)

Sanders, Robert M.; Weaver, Alfred C.

1990-01-01

The Xpress Transfer Protocol (XTP) is a reliable, real-time, light weight transfer layer protocol. Current transport layer protocols such as DoD's Transmission Control Protocol (TCP) and ISO's Transport Protocol (TP) were not designed for the next generation of high speed, interconnected reliable networks such as fiber distributed data interface (FDDI) and the gigabit/second wide area networks. Unlike all previous transport layer protocols, XTP is being designed to be implemented in hardware as a VLSI chip set. By streamlining the protocol, combining the transport and network layers and utilizing the increased speed and parallelization possible with a VLSI implementation, XTP will be able to provide the end-to-end data transmission rates demanded in high speed networks without compromising reliability and functionality. This paper describes the operation of the XTP protocol and in particular, its error, flow and rate control; inter-networking addressing mechanisms; and multicast support features, as defined in the XTP Protocol Definition Revision 3.4.
A Primer for Telemetry Interfacing in Accordance with NASA Standards Using Low Cost FPGAs

NASA Astrophysics Data System (ADS)

McCoy, Jake; Schultz, Ted; Tutt, James; Rogers, Thomas; Miles, Drew; McEntaffer, Randall

2016-03-01

Photon counting detector systems on sounding rocket payloads often require interfacing asynchronous outputs with a synchronously clocked telemetry (TM) stream. Though this can be handled with an on-board computer, there are several low cost alternatives including custom hardware, microcontrollers and field-programmable gate arrays (FPGAs). This paper outlines how a TM interface (TMIF) for detectors on a sounding rocket with asynchronous parallel digital output can be implemented using low cost FPGAs and minimal custom hardware. Low power consumption and high speed FPGAs are available as commercial off-the-shelf (COTS) products and can be used to develop the main component of the TMIF. Then, only a small amount of additional hardware is required for signal buffering and level translating. This paper also discusses how this system can be tested with a simulated TM chain in the small laboratory setting using FPGAs and COTS specialized data acquisition products.
MrGrid: A Portable Grid Based Molecular Replacement Pipeline

PubMed Central

Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.

2010-01-01

Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612
Particle simulation on heterogeneous distributed supercomputers

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.; Dagum, Leonardo

1993-01-01

We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
Implementation of density-based solver for all speeds in the framework of OpenFOAM

NASA Astrophysics Data System (ADS)

Shen, Chun; Sun, Fengxian; Xia, Xinlin

2014-10-01

In the framework of open source CFD code OpenFOAM, a density-based solver for all speeds flow field is developed. In this solver the preconditioned all speeds AUSM+(P) scheme is adopted and the dual time scheme is implemented to complete the unsteady process. Parallel computation could be implemented to accelerate the solving process. Different interface reconstruction algorithms are implemented, and their accuracy with respect to convection is compared. Three benchmark tests of lid-driven cavity flow, flow crossing over a bump, and flow over a forward-facing step are presented to show the accuracy of the AUSM+(P) solver for low-speed incompressible flow, transonic flow, and supersonic/hypersonic flow. Firstly, for the lid driven cavity flow, the computational results obtained by different interface reconstruction algorithms are compared. It is indicated that the one dimensional reconstruction scheme adopted in this solver possesses high accuracy and the solver developed in this paper can effectively catch the features of low incompressible flow. Then via the test cases regarding the flow crossing over bump and over forward step, the ability to capture characteristics of the transonic and supersonic/hypersonic flows are confirmed. The forward-facing step proves to be the most challenging for the preconditioned solvers with and without the dual time scheme. Nonetheless, the solvers described in this paper reproduce the main features of this flow, including the evolution of the initial transient.
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Voltage assisted asymmetric nanoscale wear on ultra-smooth diamond like carbon thin films at high sliding speeds

PubMed Central

Rajauria, Sukumar; Schreck, Erhard; Marchon, Bruno

2016-01-01

The understanding of tribo- and electro-chemical phenomenons on the molecular level at a sliding interface is a field of growing interest. Fundamental chemical and physical insights of sliding surfaces are crucial for understanding wear at an interface, particularly for nano or micro scale devices operating at high sliding speeds. A complete investigation of the electrochemical effects on high sliding speed interfaces requires a precise monitoring of both the associated wear and surface chemical reactions at the interface. Here, we demonstrate that head-disk interface inside a commercial magnetic storage hard disk drive provides a unique system for such studies. The results obtained shows that the voltage assisted electrochemical wear lead to asymmetric wear on either side of sliding interface. PMID:27150446
Voltage assisted asymmetric nanoscale wear on ultra-smooth diamond like carbon thin films at high sliding speeds

NASA Astrophysics Data System (ADS)

Rajauria, Sukumar; Schreck, Erhard; Marchon, Bruno

2016-05-01

The understanding of tribo- and electro-chemical phenomenons on the molecular level at a sliding interface is a field of growing interest. Fundamental chemical and physical insights of sliding surfaces are crucial for understanding wear at an interface, particularly for nano or micro scale devices operating at high sliding speeds. A complete investigation of the electrochemical effects on high sliding speed interfaces requires a precise monitoring of both the associated wear and surface chemical reactions at the interface. Here, we demonstrate that head-disk interface inside a commercial magnetic storage hard disk drive provides a unique system for such studies. The results obtained shows that the voltage assisted electrochemical wear lead to asymmetric wear on either side of sliding interface.
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Parallel Guessing: A Strategy for High-Speed Computation

DTIC Science & Technology

1984-09-19

for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Method and apparatus for combinatorial logic signal processor in a digitally based high speed x-ray spectrometer

DOEpatents

Warburton, William K.; Zhou, Zhiquing

1999-01-01

A high speed, digitally based, signal processing system which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system livetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired.
The other fiber, the other fabric, the other way

NASA Astrophysics Data System (ADS)

Stephens, Gary R.

1993-02-01

Coaxial cable and distributed switches provide a way to configure high-speed Fiber Channel fabrics. This type of fabric provides a cost-effective alternative to a fabric of optical fibers and centralized cross-point switches. The fabric topology is a simple tree. Products using parallel busses require a significant change to migrate to a serial bus. Coaxial cables and distributed switches require a smaller technology shift for these device manufacturers. Each distributed switch permits both medium type and speed changes. The fabric can grow and bridge to optical fibers as the needs expand. A distributed fabric permits earlier entry into high-speed serial operations. For very low-cost fabrics, a distributed switch may permit a link configured as a loop. The loop eliminates half of the ports when compared to a switched point-to-point fabric. A fabric of distributed switches can interface to a cross-point switch fabric. The expected sequence of migration is: closed loops, small closed fabrics, and, finally, bridges, to connect optical cross-point switch fabrics. This paper presents the concept of distributed fabrics, including address assignment, frame routing, and general operation.
Single-photon counting multicolor multiphoton fluorescence microscope.

PubMed

Buehler, Christof; Kim, Ki H; Greuter, Urs; Schlumpf, Nick; So, Peter T C

2005-01-01

We present a multicolor multiphoton fluorescence microscope with single-photon counting sensitivity. The system integrates a standard multiphoton fluorescence microscope, an optical grating spectrograph operating in the UV-Vis wavelength region, and a 16-anode photomultiplier tube (PMT). The major technical innovation is in the development of a multichannel photon counting card (mC-PhCC) for direct signal collection from multi-anode PMTs. The electronic design of the mC-PhCC employs a high-throughput, fully-parallel, single-photon counting scheme along with a high-speed electrical or fiber-optical link interface to the data acquisition computer. There is no electronic crosstalk among the detection channels of the mC-PhCC. The collected signal remains linear up to an incident photon rate of 10(8) counts per second. The high-speed data interface offers ample bandwidth for real-time readout: 2 MByte lambda-stacks composed of 16 spectral channels, 256 x 256 pixel image with 12-bit dynamic range can be transferred at 30 frames per second. The modular design of the mC-PhCC can be readily extended to accommodate PMTs of more anodes. Data acquisition from a 64-anode PMT has been verified. As a demonstration of system performance, spectrally resolved images of fluorescent latex spheres and ex-vivo human skin are reported. The multicolor multiphoton microscope is suitable for highly sensitive, real-time, spectrally-resolved three-dimensional imaging in biomedical applications.
Design of a dataway processor for a parallel image signal processing system

NASA Astrophysics Data System (ADS)

Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

1995-04-01

Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
A comparison of high-speed links, their commercial support and ongoing R&D activities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.

Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less

A comparison of high-speed links, their commercial support and ongoing R D activities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.

Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, D.B.

1996-12-31

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
YARR - A PCIe based Readout Concept for Current and Future ATLAS Pixel Modules

NASA Astrophysics Data System (ADS)

Heim, Timon

2017-10-01

The Yet Another Rapid Readout (YARR) system is a DAQ system designed for the readout of current generation ATLAS Pixel FE-I4 and next generation chips. It utilises a commercial-off-the-shelf PCIe FPGA card as a reconfigurable I/O interface, which acts as a simple gateway to pipe all data from the Pixel modules via the high speed PCIe connection into the host system’s memory. Relying on modern CPU architectures, which enables the usage of parallelised processing in threads and commercial high speed interfaces in everyday computers, it is possible to perform all processing on a software level in the host CPU. Although FPGAs are very powerful at parallel signal processing their firmware is hard to maintain and constrained by their connected hardware. Software, on the other hand, is very portable and upgraded frequently with new features coming at no cost. A DAQ concept which does not rely on the underlying hardware for acceleration also eases the transition from prototyping in the laboratory to the full scale implementation in the experiment. The overall concept and data flow will be outlined, as well as the challenges and possible bottlenecks which can be encountered when moving the processing from hardware to software.
A Peridynamic Approach for Nanoscratch Simulation of the Cement Mortar

NASA Astrophysics Data System (ADS)

Zhao, Jingjing; Zhang, Qing; Lu, Guangda; Chen, Depeng

2018-03-01

The present study develops a peridynamic approach for simulating the nanoscratch procedure on the cement mortar interface. In this approach, the cement and sand are considered as discrete particles with certain mechanical properties on the nanoscale. Besides, the interaction force functions for different components in the interface are represented by combining the van der Waals force and the peridynamic force. The nanoscratch procedures with the indenter moving along certain direction either parallel or perpendicular to the interface are simulated in this paper. The simulation results show the damage evolution processes and the final damage distributions of the cement mortar under different scratching speed and depth of the indenter, indicating that the interface between cement and sand is a weak area.
Providing a parallel and distributed capability for JMASS using SPEEDES

NASA Astrophysics Data System (ADS)

Valinski, Maria; Driscoll, Jonathan; McGraw, Robert M.; Meyer, Bob

2002-07-01

The Joint Modeling And Simulation System (JMASS) is a Tri-Service simulation environment that supports engineering and engagement-level simulations. As JMASS is expanded to support other Tri-Service domains, the current set of modeling services must be expanded for High Performance Computing (HPC) applications by adding support for advanced time-management algorithms, parallel and distributed topologies, and high speed communications. By providing support for these services, JMASS can better address modeling domains requiring parallel computationally intense calculations such clutter, vulnerability and lethality calculations, and underwater-based scenarios. A risk reduction effort implementing some HPC services for JMASS using the SPEEDES (Synchronous Parallel Environment for Emulation and Discrete Event Simulation) Simulation Framework has recently concluded. As an artifact of the JMASS-SPEEDES integration, not only can HPC functionality be brought to the JMASS program through SPEEDES, but an additional HLA-based capability can be demonstrated that further addresses interoperability issues. The JMASS-SPEEDES integration provided a means of adding HLA capability to preexisting JMASS scenarios through an implementation of the standard JMASS port communication mechanism that allows players to communicate.
CAVIAR: a 45k neuron, 5M synapse, 12G connects/s AER hardware sensory-processing- learning-actuating system for high-speed visual object recognition and tracking.

PubMed

Serrano-Gotarredona, Rafael; Oster, Matthias; Lichtsteiner, Patrick; Linares-Barranco, Alejandro; Paz-Vicente, Rafael; Gomez-Rodriguez, Francisco; Camunas-Mesa, Luis; Berner, Raphael; Rivas-Perez, Manuel; Delbruck, Tobi; Liu, Shih-Chii; Douglas, Rodney; Hafliger, Philipp; Jimenez-Moreno, Gabriel; Civit Ballcels, Anton; Serrano-Gotarredona, Teresa; Acosta-Jimenez, Antonio J; Linares-Barranco, Bernabé

2009-09-01

This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system. CAVIAR uses the asychronous address-event representation (AER) communication framework and was developed in the context of a European Union funded project. It has four custom mixed-signal AER chips, five custom digital AER interface components, 45k neurons (spiking cells), up to 5M synapses, performs 12G synaptic operations per second, and achieves millisecond object recognition and tracking latencies.
Dynamic performance of high speed solenoid valve with parallel coils

NASA Astrophysics Data System (ADS)

Kong, Xiaowu; Li, Shizhen

2014-07-01

The methods of improving the dynamic performance of high speed on/off solenoid valve include increasing the magnetic force of armature and the slew rate of coil current, decreasing the mass and stroke of moving parts. The increase of magnetic force usually leads to the decrease of current slew rate, which could increase the delay time of the dynamic response of solenoid valve. Using a high voltage to drive coil can solve this contradiction, but a high driving voltage can also lead to more cost and a decrease of safety and reliability. In this paper, a new scheme of parallel coils is investigated, in which the single coil of solenoid is replaced by parallel coils with same ampere turns. Based on the mathematic model of high speed solenoid valve, the theoretical formula for the delay time of solenoid valve is deduced. Both the theoretical analysis and the dynamic simulation show that the effect of dividing a single coil into N parallel sub-coils is close to that of driving the single coil with N times of the original driving voltage as far as the delay time of solenoid valve is concerned. A specific test bench is designed to measure the dynamic performance of high speed on/off solenoid valve. The experimental results also prove that both the delay time and switching time of the solenoid valves can be decreased greatly by adopting the parallel coil scheme. This research presents a simple and practical method to improve the dynamic performance of high speed on/off solenoid valve.
Study and design on USB wireless laser communication system

NASA Astrophysics Data System (ADS)

Wang, Aihua; Zheng, Jiansheng; Ai, Yong

2004-04-01

We give the definition of USB wireless laser communication system (WLCS) and the brief introduction to the protocol of USB, the standard of hardware is also given. The paper analyses the hardware and software of USB WLCS. Wireless laser communication part and USB interface circuit part are discussed in detail. We also give the periphery design of the chip AN2131Q, the control circuit to realize the transformation from parallel port to serial bus, and the circuit of laser sending and receiving of laser communication part, which are simply, cheap and workable. And then the four part of software are analyzed as followed. We have consummated the ISR in the firmware frame to develop the periphery device of USB. We have debugged and consummated the 'ezload,' and the GPD of the drivers. Windows application performs functions and schedules the corresponding API functions to let the interface practical and beautiful. The system can realize USB wireless laser communication between computers, which distance is farther than 50 meters, and top speed can be bigger than 8 Mbps. The system is of great practical sense to resolve the issues of high-speed communication among increasing districts without fiber trunk network.
Establishing a Novel Modeling Tool: A Python-Based Interface for a Neuromorphic Hardware System

PubMed Central

Brüderle, Daniel; Müller, Eric; Davison, Andrew; Muller, Eilif; Schemmel, Johannes; Meier, Karlheinz

2008-01-01

Neuromorphic hardware systems provide new possibilities for the neuroscience modeling community. Due to the intrinsic parallelism of the micro-electronic emulation of neural computation, such models are highly scalable without a loss of speed. However, the communities of software simulator users and neuromorphic engineering in neuroscience are rather disjoint. We present a software concept that provides the possibility to establish such hardware devices as valuable modeling tools. It is based on the integration of the hardware interface into a simulator-independent language which allows for unified experiment descriptions that can be run on various simulation platforms without modification, implying experiment portability and a huge simplification of the quantitative comparison of hardware and simulator results. We introduce an accelerated neuromorphic hardware device and describe the implementation of the proposed concept for this system. An example setup and results acquired by utilizing both the hardware system and a software simulator are demonstrated. PMID:19562085
Establishing a novel modeling tool: a python-based interface for a neuromorphic hardware system.

PubMed

Brüderle, Daniel; Müller, Eric; Davison, Andrew; Muller, Eilif; Schemmel, Johannes; Meier, Karlheinz

2009-01-01

Neuromorphic hardware systems provide new possibilities for the neuroscience modeling community. Due to the intrinsic parallelism of the micro-electronic emulation of neural computation, such models are highly scalable without a loss of speed. However, the communities of software simulator users and neuromorphic engineering in neuroscience are rather disjoint. We present a software concept that provides the possibility to establish such hardware devices as valuable modeling tools. It is based on the integration of the hardware interface into a simulator-independent language which allows for unified experiment descriptions that can be run on various simulation platforms without modification, implying experiment portability and a huge simplification of the quantitative comparison of hardware and simulator results. We introduce an accelerated neuromorphic hardware device and describe the implementation of the proposed concept for this system. An example setup and results acquired by utilizing both the hardware system and a software simulator are demonstrated.
Efficient Parallel Engineering Computing on Linux Workstations

NASA Technical Reports Server (NTRS)

Lou, John Z.

2010-01-01

A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.

PubMed

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-07

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Novel experimental methods for investigating high speed friction of titanium-aluminum-vanadium/tool steel interface and dynamic failure of extrinsically toughened DRA composites

NASA Astrophysics Data System (ADS)

Irfan, Mohammad Abdulaziz

Dynamic deformation, flow, and failure are integral parts of all dynamic processes in materials. Invariably, dynamic failure also involves the relative sliding of one component of the material over the other. Advances in elucidation of these failure mechanisms under high loading rates has been of great interest to scientists working in this area. The need to develop new dynamic mechanical property tests for materials under well characterized and controllable loading conditions has always been a challenge to experimentalists. The current study focuses on the development of two experimental methods to study some aspects of dynamic material response. The first part focuses on the development of a single stage gas gun facility for investigating high-speed metal to metal interfacial friction with applications to high speed machining. During the course of this investigation a gas gun was designed and built capable of accelerating projectiles upto velocities of 1 km/s. Using this gas gun pressure-shear plate impact friction experiments were conducted to simulate conditions similar to high speed machining at the tool-workpiece interface. The impacting plates were fabricated from materials representing the tribo-pair of interest. Accurate measurements of the interfacial tractions, i.e. the normal pressure and the frictional stress at the tribo-pair interface, and the interfacial slip velocity could be made by employing laser interferometry. Normal pressures of the order of 1-2 MPa were generated and slipping velocities of the order of 50 m/s were obtained. In order to illustrate the structure of the constitutive law governing friction, the study included experimental investigation of frictional response to step changes in normal pressure and interfacial shear stress. The results of these experiments indicate that sliding resistance for Ti6Al4V/CH steel interface is much lower than measured under quasi-static sliding conditions. Also the temperature at the interface strongly effects the sliding resistance of the interface. The experimental results deduced from the response of the sliding interface to step changes in normal pressure and the applied shear stress reinforce the importance of including frictional memory in the development of rate dependent state variable friction models. The second part of the thesis presents an investigation into the dynamic deformation and failure of extrinsically toughened DRA composites. Experiments were conducted using the split Hopkinson pressure bar to investigate the deformation and flow behavior under dynamic compression loading. A modified Hopkinson bar apparatus was used to explore the dynamic fracture behavior of three different extrinsically toughened DRA composites. The study was paralleled by systematic exploration of the failure modes in each composite. For all the composites evaluated the dynamic crack propagation characteristics of the composites are observed to be strongly dependent on the volume fraction of the ductile phase reinforcement in the composite, the yield stress of the ductile phase reinforcement, the micro-structural arrangement of the ductile phase reinforcements with respect to the notch, and the impact velocity employed in the particular experiment.
Implementation of molecular dynamics and its extensions with the coarse-grained UNRES force field on massively parallel systems; towards millisecond-scale simulations of protein structure, dynamics, and thermodynamics

PubMed Central

Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.

2010-01-01

We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Flexible Peripheral Component Interconnect Input/Output Card

NASA Technical Reports Server (NTRS)

Bigelow, Kirk K.; Jerry, Albert L.; Baricio, Alisha G.; Cummings, Jon K.

2010-01-01

The Flexible Peripheral Component Interconnect (PCI) Input/Output (I/O) Card is an innovative circuit board that provides functionality to interface between a variety of devices. It supports user-defined interrupts for interface synchronization, tracks system faults and failures, and includes checksum and parity evaluation of interface data. The card supports up to 16 channels of high-speed, half-duplex, low-voltage digital signaling (LVDS) serial data, and can interface combinations of serial and parallel devices. Placement of a processor within the field programmable gate array (FPGA) controls an embedded application with links to host memory over its PCI bus. The FPGA also provides protocol stacking and quick digital signal processor (DSP) functions to improve host performance. Hardware timers, counters, state machines, and other glue logic support interface communications. The Flexible PCI I/O Card provides an interface for a variety of dissimilar computer systems, featuring direct memory access functionality. The card has the following attributes: 8/16/32-bit, 33-MHz PCI r2.2 compliance, Configurable for universal 3.3V/5V interface slots, PCI interface based on PLX Technology's PCI9056 ASIC, General-use 512K 16 SDRAM memory, General-use 1M 16 Flash memory, FPGA with 3K to 56K logical cells with embedded 27K to 198K bits RAM, I/O interface: 32-channel LVDS differential transceivers configured in eight, 4-bit banks; signaling rates to 200 MHz per channel, Common SCSI-3, 68-pin interface connector.
Digital intermediate frequency QAM modulator using parallel processing

DOEpatents

Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

2008-05-27

The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination

NASA Astrophysics Data System (ADS)

Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo

2018-01-01

We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.

Integration of Modelling and Graphics to Create an Infrared Signal Processing Test Bed

NASA Astrophysics Data System (ADS)

Sethi, H. R.; Ralph, John E.

1989-03-01

The work reported in this paper was carried out as part of a contract with MoD (PE) UK. It considers the problems associated with realistic modelling of a passive infrared system in an operational environment. Ideally all aspects of the system and environment should be integrated into a complete end-to-end simulation but in the past limited computing power has prevented this. Recent developments in workstation technology and the increasing availability of parallel processing techniques makes the end-to-end simulation possible. However the complexity and speed of such simulations means difficulties for the operator in controlling the software and understanding the results. These difficulties can be greatly reduced by providing an extremely user friendly interface and a very flexible, high power, high resolution colour graphics capability. Most system modelling is based on separate software simulation of the individual components of the system itself and its environment. These component models may have their own characteristic inbuilt assumptions and approximations, may be written in the language favoured by the originator and may have a wide variety of input and output conventions and requirements. The models and their limitations need to be matched to the range of conditions appropriate to the operational scenerio. A comprehensive set of data bases needs to be generated by the component models and these data bases must be made readily available to the investigator. Performance measures need to be defined and displayed in some convenient graphics form. Some options are presented for combining available hardware and software to create an environment within which the models can be integrated, and which provide the required man-machine interface, graphics and computing power. The impact of massively parallel processing and artificial intelligence will be discussed. Parallel processing will make real time end-to-end simulation possible and will greatly improve the graphical visualisation of the model output data. Artificial intelligence should help to enhance the man-machine interface.
Method and apparatus for combinatorial logic signal processor in a digitally based high speed x-ray spectrometer

DOEpatents

Warburton, W.K.

1999-02-16

A high speed, digitally based, signal processing system is disclosed which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system lifetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired. 31 figs.
An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed

NASA Astrophysics Data System (ADS)

Kim, H.; Choi, Y.; Yang, Y.

In this paper a state-of-the-art FPGA (Field Programmable Gate Array) based high speed parallel signal processing system (SPS) for adaptive optics (AO) testbed with 1 kHz wavefront error (WFE) correction frequency is reported. The AO system consists of Shack-Hartmann sensor (SHS) and deformable mirror (DM), tip-tilt sensor (TTS), tip-tilt mirror (TTM) and an FPGA-based high performance SPS to correct wavefront aberrations. The SHS is composed of 400 subapertures and the DM 277 actuators with Fried geometry, requiring high speed parallel computing capability SPS. In this study, the target WFE correction speed is 1 kHz; therefore, it requires massive parallel computing capabilities as well as strict hard real time constraints on measurements from sensors, matrix computation latency for correction algorithms, and output of control signals for actuators. In order to meet them, an FPGA based real-time SPS with parallel computing capabilities is proposed. In particular, the SPS is made up of a National Instrument's (NI's) real time computer and five FPGA boards based on state-of-the-art Xilinx Kintex 7 FPGA. Programming is done with NI's LabView environment, providing flexibility when applying different algorithms for WFE correction. It also facilitates faster programming and debugging environment as compared to conventional ones. One of the five FPGA's is assigned to measure TTS and calculate control signals for TTM, while the rest four are used to receive SHS signal, calculate slops for each subaperture and correction signal for DM. With this parallel processing capabilities of the SPS the overall closed-loop WFE correction speed of 1 kHz has been achieved. System requirements, architecture and implementation issues are described; furthermore, experimental results are also given.
Evaluation of the power consumption of a high-speed parallel robot

NASA Astrophysics Data System (ADS)

Han, Gang; Xie, Fugui; Liu, Xin-Jun

2018-06-01

An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
PCIE interface design for high-speed image storage system based on SSD

NASA Astrophysics Data System (ADS)

Wang, Shiming

2015-02-01

This paper proposes and implements a standard interface of miniaturized high-speed image storage system, which combines PowerPC with FPGA and utilizes PCIE bus as the high speed switching channel. Attached to the PowerPC, mSATA interface SSD(Solid State Drive) realizes RAID3 array storage. At the same time, a high-speed real-time image compression patent IP core also can be embedded in FPGA, which is in the leading domestic level with compression rate and image quality, making that the system can record higher image data rate or achieve longer recording time. The notebook memory card buckle type design is used in the mSATA interface SSD, which make it possible to complete the replacement in 5 seconds just using single hand, thus the total length of repeated recordings is increased. MSI (Message Signaled Interrupts) interruption guarantees the stability and reliability of continuous DMA transmission. Furthermore, only through the gigabit network, the remote display, control and upload to backup function can be realized. According to an optional 25 frame/s or 30 frame/s, upload speeds can be up to more than 84 MB/s. Compared with the existing FLASH array high-speed memory systems, it has higher degree of modularity, better stability and higher efficiency on development, maintenance and upgrading. Its data access rate is up to 300MB/s, realizing the high speed image storage system miniaturization, standardization and modularization, thus it is fit for image acquisition, storage and real-time transmission to server on mobile equipment.
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

PubMed

Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

2014-01-01

Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

1994-01-01

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry

DOEpatents

van den Engh, Gerrit J.; Stokdijk, Willem

1992-01-01

A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate.
Radiation-Hard SpaceWire/Gigabit Ethernet-Compatible Transponder

NASA Technical Reports Server (NTRS)

Katzman, Vladimir

2012-01-01

A radiation-hard transponder was developed utilizing submicron/nanotechnology from IBM. The device consumes low power and has a low fabrication cost. This device utilizes a Plug-and-Play concept, and can be integrated into intra-satellite networks, supporting SpaceWire and Gigabit Ethernet I/O. A space-qualified, 100-pin package also was developed, allowing space-qualified (class K) transponders to be delivered within a six-month time frame. The novel, optical, radiation-tolerant transponder was implemented as a standalone board, containing the transponder ASIC (application specific integrated circuit) and optical module, with an FPGA (field-programmable gate array) friendly parallel interface. It features improved radiation tolerance; high-data-rate, low-power consumption; and advanced functionality. The transponder utilizes a patented current mode logic library of radiation-hardened-by-architecture cells. The transponder was developed, fabricated, and radhard tested up to 1 MRad. It was fabricated using 90-nm CMOS (complementary metal oxide semiconductor) 9 SF process from IBM, and incorporates full BIT circuitry, allowing a loop back test. The low-speed parallel LVCMOS (lowvoltage complementary metal oxide semiconductor) bus is compatible with Actel FPGA. The output LVDS (low-voltage differential signaling) interface operates up to 1.5 Gb/s. Built-in CDR (clock-data recovery) circuitry provides robust synchronization and incorporates two alarm signals such as synch loss and signal loss. The ultra-linear peak detector scheme allows on-line control of the amplitude of the input signal. Power consumption is less than 300 mW. The developed transponder with a 1.25 Gb/s serial data rate incorporates a 10-to-1 serializer with an internal clock multiplication unit and a 10-1 deserializer with internal clock and data recovery block, which can operate with 8B10B encoded signals. Three loop-back test modes are provided to facilitate the built-in-test functionality. The design is based on a proprietary library of differential current switching logic cells implemented in the standard 90-nm CMOS 9SF technology from IBM. The proprietary low-power LVDS physical interface is fully compatible with the SpaceWire standard, and can be directly connected to the SFP MSA (small form factor pluggable Multiple Source Agreement) optical transponder. The low-speed parallel interfaces are fully compatible with the standard 1.8 V CMOS input/output devices. The utilized proprietary annular CMOS layout structures provide TID tolerance above 1.2 MRad. The complete chip consumes less than 150 mW of power from a single 1.8-V positive supply source.
Controller and interface module for the High-Speed Data Acquisition System correlator/accumulator

NASA Technical Reports Server (NTRS)

Brokl, S. S.

1985-01-01

One complex channel of the High-Speed Data Acquisition System (a subsystem used in the Goldstone solar system radar), consisting of two correlator modules and one accumulator module, is operated by the controller and interface module interfaces are provided to the VAX UNIBUS for computer control, monitor, and test of the controller and correlator/accumulator. The correlator and accumulator modules controlled by this module are the key digital signal processing elements of the Goldstone High-Speed Data Acquisition System. This fully programmable unit provides for a wide variety of correlation and filtering functions operating on a three megaword/second data flow. Data flow is to the VAX by way of the I/O port of a FPS 5210 array processor.
Systems-on-chip approach for real-time simulation of wheel-rail contact laws

NASA Astrophysics Data System (ADS)

Mei, T. X.; Zhou, Y. J.

2013-04-01

This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Seeing the forest for the trees: Networked workstations as a parallel processing computer

NASA Technical Reports Server (NTRS)

Breen, J. O.; Meleedy, D. M.

1992-01-01

Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry

DOEpatents

Engh, G.J. van den; Stokdijk, W.

1992-09-22

A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate. 17 figs.
Frequency-encoded photonic qubits for scalable quantum information processing

DOE PAGES

Lukens, Joseph M.; Lougovski, Pavel

2016-12-21

Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
Frequency-encoded photonic qubits for scalable quantum information processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lukens, Joseph M.; Lougovski, Pavel

Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

NASA Astrophysics Data System (ADS)

Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

2017-07-01

Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Power Conditioning for High-Speed Tracked Vehicles

DOT National Transportation Integrated Search

1971-01-01

The linear induction motor is to provide the propulsion of high-speed tracked vehicles; speed and brake control of the propulsion motor is essential for vehicle operation. The purpose of power conditioning is to provide the power matching interface b...
Power Conditioning for High Speed Tracked Vehicles

DOT National Transportation Integrated Search

1973-01-01

The linear induction motor is to provide the propulsion of high-speed tracked vehicles; speed and brake control of the propulsion motor is essential for vehicle operation. The purpose of power conditioning is to provide the power matching interface b...

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.

PubMed

Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei

2015-05-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

PubMed Central

Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei

2014-01-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487
Graphics Processors in HEP Low-Level Trigger Systems

NASA Astrophysics Data System (ADS)

Ammendola, Roberto; Biagioni, Andrea; Chiozzi, Stefano; Cotta Ramusino, Angelo; Cretaro, Paolo; Di Lorenzo, Stefano; Fantechi, Riccardo; Fiorini, Massimiliano; Frezza, Ottorino; Lamanna, Gianluca; Lo Cicero, Francesca; Lonardo, Alessandro; Martinelli, Michele; Neri, Ilaria; Paolucci, Pier Stanislao; Pastorelli, Elena; Piandani, Roberto; Pontisso, Luca; Rossetti, Davide; Simula, Francesco; Sozzi, Marco; Vicini, Piero

2016-11-01

Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, although so far applications have been employing GPUs typically for offline computations. Taking into account the steady performance increase of GPU architectures in terms of computing power and I/O capacity, the real-time applications of these devices can thrive in high-energy physics data acquisition and trigger systems. We will examine the use of online parallel computing on GPUs for the synchronous low-level trigger, focusing on tests performed on the trigger system of the CERN NA62 experiment. To successfully integrate GPUs in such an online environment, latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Furthermore, it is assessed how specific trigger algorithms can be parallelized and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen Large Hadron Collider (LHC) luminosity upgrade where highly selective algorithms will be essential to maintain sustainable trigger rates with very high pileup.
Flexibility and Performance of Parallel File Systems

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1996-01-01

As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
The Visualization Toolkit (VTK): Rewriting the rendering code for modern graphics cards

NASA Astrophysics Data System (ADS)

Hanwell, Marcus D.; Martin, Kenneth M.; Chaudhary, Aashish; Avila, Lisa S.

2015-09-01

The Visualization Toolkit (VTK) is an open source, permissively licensed, cross-platform toolkit for scientific data processing, visualization, and data analysis. It is over two decades old, originally developed for a very different graphics card architecture. Modern graphics cards feature fully programmable, highly parallelized architectures with large core counts. VTK's rendering code was rewritten to take advantage of modern graphics cards, maintaining most of the toolkit's programming interfaces. This offers the opportunity to compare the performance of old and new rendering code on the same systems/cards. Significant improvements in rendering speeds and memory footprints mean that scientific data can be visualized in greater detail than ever before. The widespread use of VTK means that these improvements will reap significant benefits.
Parallel Execution of Functional Mock-up Units in Buildings Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

2016-06-30

A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less
Multigigabit optical transceivers for high-data rate military applications

NASA Astrophysics Data System (ADS)

Catanzaro, Brian E.; Kuznia, Charlie

2012-01-01

Avionics has experienced an ever increasing demand for processing power and communication bandwidth. Currently deployed avionics systems require gigabit communication using opto-electronic transceivers connected with parallel optical fiber. Ultra Communications has developed a series of transceiver solutions combining ASIC technology with flip-chip bonding and advanced opto-mechanical molded optics. Ultra Communications custom high speed ASIC chips are developed using an SoS (silicon on sapphire) process. These circuits are flip chip bonded with sources (VCSEL arrays) and detectors (PIN diodes) to create an Opto-Electronic Integrated Circuit (OEIC). These have been combined with micro-optics assemblies to create transceivers with interfaces to standard fiber array (MT) cabling technology. We present an overview of the demands for transceivers in military applications and how new generation transceivers leverage both previous generation military optical transceivers as well as commercial high performance computing optical transceivers.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
The science of computing - Parallel computation

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
Wide-field high-speed space-division multiplexing optical coherence tomography using an integrated photonic device

PubMed Central

Huang, Yongyang; Badar, Mudabbir; Nitkowski, Arthur; Weinroth, Aaron; Tansu, Nelson; Zhou, Chao

2017-01-01

Space-division multiplexing optical coherence tomography (SDM-OCT) is a recently developed parallel OCT imaging method in order to achieve multi-fold speed improvement. However, the assembly of fiber optics components used in the first prototype system was labor-intensive and susceptible to errors. Here, we demonstrate a high-speed SDM-OCT system using an integrated photonic chip that can be reliably manufactured with high precisions and low per-unit cost. A three-layer cascade of 1 × 2 splitters was integrated in the photonic chip to split the incident light into 8 parallel imaging channels with ~3.7 mm optical delay in air between each channel. High-speed imaging (~1s/volume) of porcine eyes ex vivo and wide-field imaging (~18.0 × 14.3 mm2) of human fingers in vivo were demonstrated with the chip-based SDM-OCT system. PMID:28856055
Nanoscale wear and kinetic friction between atomically smooth surfaces sliding at high speeds

NASA Astrophysics Data System (ADS)

Rajauria, Sukumar; Canchi, Sripathi V.; Schreck, Erhard; Marchon, Bruno

2015-02-01

The kinetic friction and wear at high sliding speeds is investigated using the head-disk interface of hard disk drives, wherein the head and the disk are less than 10 nm apart and move at sliding speeds of 5-10 m/s relative to each other. While the spacing between the sliding surfaces is of the same order of magnitude as various AFM based fundamental studies on friction, the sliding speed is nearly six orders of magnitude larger, allowing a unique set-up for a systematic study of nanoscale wear at high sliding speeds. In a hard disk drive, the physical contact between the head and the disk leads to friction, wear, and degradation of the head overcoat material (typically diamond like carbon). In this work, strain gauge based friction measurements are performed; the friction coefficient as well as the adhering shear strength at the head-disk interface is extracted; and an experimental set-up for studying friction between high speed sliding surfaces is exemplified.
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
ARINC 818 adds capabilities for high-speed sensors and systems

NASA Astrophysics Data System (ADS)

Keller, Tim; Grunwald, Paul

2014-06-01

ARINC 818, titled Avionics Digital Video Bus (ADVB), is the standard for cockpit video that has gained wide acceptance in both the commercial and military cockpits including the Boeing 787, the A350XWB, the A400M, the KC- 46A and many others. Initially conceived of for cockpit displays, ARINC 818 is now propagating into high-speed sensors, such as infrared and optical cameras due to its high-bandwidth and high reliability. The ARINC 818 specification that was initially release in the 2006 and has recently undergone a major update that will enhance its applicability as a high speed sensor interface. The ARINC 818-2 specification was published in December 2013. The revisions to the specification include: video switching, stereo and 3-D provisions, color sequential implementations, regions of interest, data-only transmissions, multi-channel implementations, bi-directional communication, higher link rates to 32Gbps, synchronization signals, options for high-speed coax interfaces and optical interface details. The additions to the specification are especially appealing for high-bandwidth, multi sensor systems that have issues with throughput bottlenecks and SWaP concerns. ARINC 818 is implemented on either copper or fiber optic high speed physical layers, and allows for time multiplexing multiple sensors onto a single link. This paper discusses each of the new capabilities in the ARINC 818-2 specification and the benefits for ISR and countermeasures implementations, several examples are provided.
USB 3.0 readout and time-walk correction method for Timepix3 detector

NASA Astrophysics Data System (ADS)

Turecek, D.; Jakubek, J.; Soukup, P.

2016-12-01

The hybrid particle counting pixel detectors of Medipix family are well known. In this contribution we present new USB 3.0 based interface AdvaDAQ for Timepix3 detector. The AdvaDAQ interface is designed with a maximal emphasis to the flexibility. It is successor of FitPIX interface developed in IEAP CTU in Prague. Its modular architecture supports all Medipix/Timepix chips and all their different readout modes: Medipix2, Timepix (serial and parallel), Medipix3 and Timepix3. The high bandwidth of USB 3.0 permits readout of 1700 full frames per second with Timepix or 8 channel data acquisition from Timepix3 at frequency of 320 MHz. The control and data acquisition is integrated in a multiplatform PiXet software (MS Windows, Mac OS, Linux). In the second part of the publication a new method for correction of the time-walk effect in Timepix3 is described. Moreover, a fully spectroscopic X-ray imaging with Timepix3 detector operated in the ToT mode (Time-over-Threshold) is presented. It is shown that the AdvaDAQ's readout speed is sufficient to perform spectroscopic measurement at full intensity of radiographic setups equipped with nano- or micro-focus X-ray tubes.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure

NASA Astrophysics Data System (ADS)

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-01

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent receivers.

PubMed

Zhou, Xian; Chen, Xue

2011-05-09

The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112 Gbit/s POLMUX-DQPSK signal at the hundreds MHz range. © 2011 Optical Society of America
A high speed PE-ALD ZnO Schottky diode rectifier with low interface-state density

NASA Astrophysics Data System (ADS)

Jin, Jidong; Zhang, Jiawei; Shaw, Andrew; Kudina, Valeriya N.; Mitrovic, Ivona Z.; Wrench, Jacqueline S.; Chalker, Paul R.; Balocco, Claudio; Song, Aimin; Hall, Steve

2018-02-01

Zinc oxide (ZnO) has recently attracted attention for its potential application to high speed electronics. In this work, a high speed Schottky diode rectifier was fabricated based on a ZnO thin film deposited by plasma-enhanced atomic layer deposition and a PtOx Schottky contact deposited by reactive radio-frequency sputtering. The rectifier shows an ideality factor of 1.31, an effective barrier height of 0.79 eV, a rectification ratio of 1.17 × 107, and cut-off frequency as high as 550 MHz. Low frequency noise measurements reveal that the rectifier has a low interface-state density of 5.13 × 1012 cm-2 eV-1, and the noise is dominated by the mechanism of a random walk of electrons at the PtO x /ZnO interface. The work shows that the rectifier can be used for both noise sensitive and high frequency electronics applications.
High-speed real-time animated displays on the ADAGE (trademark) RDS 3000 raster graphics system

NASA Technical Reports Server (NTRS)

Kahlbaum, William M., Jr.; Ownbey, Katrina L.

1989-01-01

Techniques which may be used to increase the animation update rate of real-time computer raster graphic displays are discussed. They were developed on the ADAGE RDS 3000 graphic system in support of the Advanced Concepts Simulator at the NASA Langley Research Center. These techniques involve the use of a special purpose parallel processor, for high-speed character generation. The description of the parallel processor includes the Barrel Shifter which is part of the hardware and is the key to the high-speed character rendition. The final result of this total effort was a fourfold increase in the update rate of an existing primary flight display from 4 to 16 frames per second.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less

PEGASUS 5: An Automated Pre-Processor for Overset-Grid CFD

NASA Technical Reports Server (NTRS)

Suhs, Norman E.; Rogers, Stuart E.; Dietz, William E.; Kwak, Dochan (Technical Monitor)

2002-01-01

An all new, automated version of the PEGASUS software has been developed and tested. PEGASUS provides the hole-cutting and connectivity information between overlapping grids, and is used as the final part of the grid generation process for overset-grid computational fluid dynamics approaches. The new PEGASUS code (Version 5) has many new features: automated hole cutting; a projection scheme for fixing gaps in overset surfaces; more efficient interpolation search methods using an alternating digital tree; hole-size optimization based on adding additional layers of fringe points; and an automatic restart capability. The new code has also been parallelized using the Message Passing Interface standard. The parallelization performance provides efficient speed-up of the execution time by an order of magnitude, and up to a factor of 30 for very large problems. The results of three example cases are presented: a three-element high-lift airfoil, a generic business jet configuration, and a complete Boeing 777-200 aircraft in a high-lift landing configuration. Comparisons of the computed flow fields for the airfoil and 777 test cases between the old and new versions of the PEGASUS codes show excellent agreement with each other and with experimental results.
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Study of Electromagnetic Repulsion Switch to High Speed Reclosing and Recover Time Characteristics of Superconductor

NASA Astrophysics Data System (ADS)

Koyama, Tomonori; Kaiho, Katsuyuki; Yamaguchi, Iwao; Yanabu, Satoru

Using a high-temperature superconductor, we constructed and tested a model superconducting fault current limiter (SFCL). The superconductor and vacuum interrupter as the commutation switch were connected in parallel using a bypass coil. When the fault current flows in this equipment, the superconductor is quenched and the current is then transferred to the parallel coil due to the voltage drop in the superconductor. This large current in the parallel coil actuates the magnetic repulsion mechanism of the vacuum interrupter and the current in the superconductor is broken. Using this equipment, the current flow time in the superconductor can be easily minimized. On the other hand, the fault current is also easily limited by large reactance of the parallel coil. This system has many merits. So, we introduced to electromagnetic repulsion switch. There is duty of high speed re-closing after interrupting fault current in the electrical power system. So the SFCL should be recovered to superconducting state before high speed re-closing. But, superconductor generated heat at the time of quench. It takes time to recover superconducting state. Therefore it is a matter of recovery time. In this paper, we studied recovery time of superconductor. Also, we proposed electromagnetic repulsion switch with reclosing system.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

NASA Technical Reports Server (NTRS)

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

1994-01-01

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Accelerated Adaptive MGS Phase Retrieval

NASA Technical Reports Server (NTRS)

Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

2011-01-01

The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
A Domain Decomposition Parallelization of the Fast Marching Method

NASA Technical Reports Server (NTRS)

Herrmann, M.

2003-01-01

In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Definition of the Spatial Resolution of X-Ray Microanalysis in Thin Foils

NASA Technical Reports Server (NTRS)

Williams, D. B.; Michael, J. R.; Goldstein, J. I.; Romig, A. D., Jr.

1992-01-01

The spatial resolution of X-ray microanalysis in thin foils is defined in terms of the incident electron beam diameter and the average beam broadening. The beam diameter is defined as the full width tenth maximum of a Gaussian intensity distribution. The spatial resolution is calculated by a convolution of the beam diameter and the average beam broadening. This definition of the spatial resolution can be related simply to experimental measurements of composition profiles across interphase interfaces. Monte Carlo calculations using a high-speed parallel supercomputer show good agreement with this definition of the spatial resolution and calculations based on this definition. The agreement is good over a range of specimen thicknesses and atomic number, but is poor when excessive beam tailing distorts the assumed Gaussian electron intensity distributions. Beam tailing occurs in low-Z materials because of fast secondary electrons and in high-Z materials because of plural scattering.
Scalable Multiprocessor for High-Speed Computing in Space

NASA Technical Reports Server (NTRS)

Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard

2004-01-01

A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
2005 10th Annual Expeditionary Conference

DTIC Science & Technology

2005-10-27

HIGH SPEED CONNECTORS/ SHIP INTERFACES PAUL BISHOP COALESCENT – BISHOP GROUP CO-DIRECTOR LOGISTICS Study Team 5 Bill...Force (CLF) 5. Connectors: High Speed Ship (HSS), High Speed Vessel (HSV), Assault Connectors (LCAC, EFV) 6. Sister Service and Coalition Force Ships ...of wartime cargo moves by ship Theater port infrastructure is critical to off-loading ships UNCLASSIFIED (U) Create a port Augment an established
Exploiting Anti-T-shaped Graphene Architecture to Form Low Tortuosity, Sieve-like Interfaces for High-Performance Anodes for Li-Based Cells

PubMed Central

2017-01-01

Graphitic carbon anodes have long been used in Li ion batteries due to their combination of attractive properties, such as low cost, high gravimetric energy density, and good rate capability. However, one significant challenge is controlling, and optimizing, the nature and formation of the solid electrolyte interphase (SEI). Here it is demonstrated that carbon coating via chemical vapor deposition (CVD) facilitates high electrochemical performance of carbon anodes. We examine and characterize the substrate/vertical graphene interface (multilayer graphene nanowalls coated onto carbon paper via plasma enhanced CVD), revealing that these low-tortuosity and high-selection graphene nanowalls act as fast Li ion transport channels. Moreover, we determine that the hitherto neglected parallel layer acts as a protective surface at the interface, enhancing the anode performance. In summary, these findings not only clarify the synergistic role of the parallel functional interface when combined with vertical graphene nanowalls but also have facilitated the development of design principles for future high rate, high performance batteries. PMID:29392179
Evaluation of an Airborne Spacing Concept, On-Board Spacing Tool, and Pilot Interface

NASA Technical Reports Server (NTRS)

Swieringa, Kurt; Murdoch, Jennifer L.; Baxley, Brian; Hubbs, Clay

2011-01-01

The number of commercial aircraft operations is predicted to increase in the next ten years, creating a need for improved operational efficiency. Two areas believed to offer significant increases in efficiency are optimized profile descents and dependent parallel runway operations. It is envisioned that during both of these types of operations, flight crews will precisely space their aircraft behind preceding aircraft at air traffic control assigned intervals to increase runway throughput and maximize the use of existing infrastructure. This paper describes a human-in-the-loop experiment designed to study the performance of an onboard spacing algorithm and pilots ratings of the usability and acceptability of an airborne spacing concept that supports dependent parallel arrivals. Pilot participants flew arrivals into the Dallas Fort-Worth terminal environment using one of three different simulators located at the National Aeronautics and Space Administration s (NASA) Langley Research Center. Scenarios were flown using Interval Management with Spacing (IM-S) and Required Time of Arrival (RTA) control methods during conditions of no error, error in the forecast wind, and offset (disturbance) to the arrival flow. Results indicate that pilots delivered their aircraft to the runway threshold within +/- 3.5 seconds of their assigned arrival time and reported that both the IM-S and RTA procedures were associated with low workload levels. In general, pilots found the IM-S concept, procedures, speeds, and interface acceptable; with 92% of pilots rating the procedures as complete and logical, 218 out of 240 responses agreeing that the IM-S speeds were acceptable, and 63% of pilots reporting that the displays were easy to understand and displayed in appropriate locations. The 22 (out of 240) responses, indicating that the commanded speeds were not acceptable and appropriate occurred during scenarios containing wind error and offset error. Concerns cited included the occurrence of multiple speed changes within a short time period, speed changes required within twenty miles of the runway, and an increase in airspeed followed shortly by a decrease in airspeed. Within this paper, appropriate design recommendations are provided, and the need for continued, iterative human-centered design is discussed.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

PubMed

Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

2017-09-22

A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.
Equilibrium structure of the plasma sheet boundary layer-lobe interface

NASA Technical Reports Server (NTRS)

Romero, H.; Ganguli, G.; Palmadesso, P.; Dusenbery, P. B.

1990-01-01

Observations are presented which show that plasma parameters vary on a scale length smaller than the ion gyroradius at the interface between the plasma sheet boundary layer and the lobe. The Vlasov equation is used to investigate the properties of such a boundary layer. The existence, at the interface, of a density gradient whose scale length is smaller than the ion gyroradius implies that an electrostatic potential is established in order to maintain quasi-neutrality. Strongly sheared (scale lengths smaller than the ion gyroradius) perpendicular and parallel (to the ambient magnetic field) electron flows develop whose peak velocities are on the order of the electron thermal speed and which carry a net current. The free energy of the sheared flows can give rise to a broadband spectrum of electrostatic instabilities starting near the electron plasma frequency and extending below the lower hybrid frequency.
The portals 4.0.1 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2013-04-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Cloud parallel processing of tandem mass spectrometry based proteomics data.

PubMed

Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

2012-10-05

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
Performance of the Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Analysis and topology optimization design of high-speed driving spindle

NASA Astrophysics Data System (ADS)

Wang, Zhilin; Yang, Hai

2018-04-01

The three-dimensional model of high-speed driving spindle is established by using SOLIDWORKS. The model is imported through the interface of ABAQUS, A finite element analysis model of high-speed driving spindle was established by using spring element to simulate bearing boundary condition. High-speed driving spindle for the static analysis, the spindle of the stress, strain and displacement nephogram, and on the basis of the results of the analysis on spindle for topology optimization, completed the lightweight design of high-speed driving spindle. The design scheme provides guidance for the design of axial parts of similar structures.
Experiment Description and Results for Arrival Operations Using Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR)

NASA Technical Reports Server (NTRS)

Baxley, Brian T.; Murdoch, Jennifer L.; Swieringa, Kurt A.; Barmore, Bryan E.; Capron, William R.; Hubbs, Clay E.; Shay, Richard F.; Abbott, Terence S.

2013-01-01

The predicted increase in the number of commercial aircraft operations creates a need for improved operational efficiency. Two areas believed to offer increases in aircraft efficiency are optimized profile descents and dependent parallel runway operations. Using Flight deck Interval Management (FIM) software and procedures during these operations, flight crews can achieve by the runway threshold an interval assigned by air traffic control (ATC) behind the preceding aircraft that maximizes runway throughput while minimizing additional fuel consumption and pilot workload. This document describes an experiment where 24 pilots flew arrivals into the Dallas Fort-Worth terminal environment using one of three simulators at NASA?s Langley Research Center. Results indicate that pilots delivered their aircraft to the runway threshold within +/- 3.5 seconds of their assigned time interval, and reported low workload levels. In general, pilots found the FIM concept, procedures, speeds, and interface acceptable. Analysis of the time error and FIM speed changes as a function of arrival stream position suggest the spacing algorithm generates stable behavior while in the presence of continuous (wind) or impulse (offset) error. Concerns reported included multiple speed changes within a short time period, and an airspeed increase followed shortly by an airspeed decrease.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajauria, Sukumar, E-mail: sukumar.rajauria@hgst.com; Canchi, Sripathi V., E-mail: sripathi.canchi@hgst.com; Schreck, Erhard

The kinetic friction and wear at high sliding speeds is investigated using the head-disk interface of hard disk drives, wherein the head and the disk are less than 10 nm apart and move at sliding speeds of 5–10 m/s relative to each other. While the spacing between the sliding surfaces is of the same order of magnitude as various AFM based fundamental studies on friction, the sliding speed is nearly six orders of magnitude larger, allowing a unique set-up for a systematic study of nanoscale wear at high sliding speeds. In a hard disk drive, the physical contact between the head andmore » the disk leads to friction, wear, and degradation of the head overcoat material (typically diamond like carbon). In this work, strain gauge based friction measurements are performed; the friction coefficient as well as the adhering shear strength at the head-disk interface is extracted; and an experimental set-up for studying friction between high speed sliding surfaces is exemplified.« less

The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
High-precision shape representation using a neuromorphic vision sensor with synchronous address-event communication interface

NASA Astrophysics Data System (ADS)

Belbachir, A. N.; Hofstätter, M.; Litzenberger, M.; Schön, P.

2009-10-01

A synchronous communication interface for neuromorphic temporal contrast vision sensors is described and evaluated in this paper. This interface has been designed for ultra high-speed synchronous arbitration of a temporal contrast image sensors pixels' data. Enabling high-precision timestamping, this system demonstrates its uniqueness for handling peak data rates and preserving the main advantage of the neuromorphic electronic systems, that is high and accurate temporal resolution. Based on a synchronous arbitration concept, the timestamping has a resolution of 100 ns. Both synchronous and (state-of-the-art) asynchronous arbiters have been implemented in a neuromorphic dual-line vision sensor chip in a standard 0.35 µm CMOS process. The performance analysis of both arbiters and the advantages of the synchronous arbitration over asynchronous arbitration in capturing high-speed objects are discussed in detail.
System architecture of a gallium arsenide one-gigahertz digital IC tester

NASA Technical Reports Server (NTRS)

Fouts, Douglas J.; Johnson, John M.; Butner, Steven E.; Long, Stephen I.

1987-01-01

The design for a 1-GHz digital integrated circuit tester for the evaluation of custom GaAs chips and subsystems is discussed. Technology-related problems affecting the design of a GaAs computer are discussed, with emphasis on the problems introduced by long printed-circuit-board interconnect. High-speed interface modules provide a link between the low-speed microprocessor and the chip under test. Memory-multiplexer and memory-shift register architectures for the storage of test vectors are described in addition to an architecture for local data storage consisting of a long chain of GaAs shift registers. The tester is constructed around a VME system card cage and backplane, and very little high-speed interconnect exists between boards. The tester has a three part self-test consisting of a CPU board confidence test, a main memory confidence test, and a high-speed interface module functional test.
Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter

NASA Astrophysics Data System (ADS)

Watanabe, Eriko; Kodate, Kashiko

2005-11-01

We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.
A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
High-speed technique based on a parallel projection correlation procedure for digital image correlation

NASA Astrophysics Data System (ADS)

Zaripov, D. I.; Renfu, Li

2018-05-01

The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
Review of Engine/Airframe/Drive Train Dynamic Interface Development Problems

DTIC Science & Technology

1978-06-01

dynamic interface problems associated with the Cd-54, S-61, CH-53, SH-3, S-58, SH-34, S-64, BLACK HAOK, and the ABC . The ultimate benefit will be the...drive systems of the CH-3C, CH-53A, and CH-54A helicopters. Prior to the shaft failure incident, the input drive shaft sytems had accumulated in...capability of the ABC aircraft. This aircraft has a large range in forward speed, zero to 280 knots. At high aircraft speeds, the rotor speed must be reduced
Radio Synthesis Imaging - A High Performance Computing and Communications Project

NASA Astrophysics Data System (ADS)

Crutcher, Richard M.

The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Cedar Project---Original goals and progress to date

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cybenko, G.; Kuck, D.; Padua, D.

1990-11-28

This work encompasses a broad attack on high speed parallel processing. Hardware, software, applications development, and performance evaluation and visualization as well as research topics are proposed. Our goal is to develop practical parallel processing for the 1990's.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Graphical processors for HEP trigger systems

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-02-01

General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
Flexible All-Digital Receiver for Bandwidth Efficient Modulations

NASA Technical Reports Server (NTRS)

Gray, Andrew; Srinivasan, Meera; Simon, Marvin; Yan, Tsun-Yee

2000-01-01

An all-digital high data rate parallel receiver architecture developed jointly by Goddard Space Flight Center and the Jet Propulsion Laboratory is presented. This receiver utilizes only a small number of high speed components along with a majority of lower speed components operating in a parallel frequency domain structure implementable in CMOS, and can currently process up to 600 Mbps with standard QPSK modulation. Performance results for this receiver for bandwidth efficient QPSK modulation schemes such as square-root raised cosine pulse shaped QPSK and Feher's patented QPSK are presented, demonstrating the flexibility of the receiver architecture.
Implementation of High Speed Distributed Data Acquisition System

NASA Astrophysics Data System (ADS)

Raju, Anju P.; Sekhar, Ambika

2012-09-01

This paper introduces a high speed distributed data acquisition system based on a field programmable gate array (FPGA). The aim is to develop a "distributed" data acquisition interface. The development of instruments such as personal computers and engineering workstations based on "standard" platforms is the motivation behind this effort. Using standard platforms as the controlling unit allows independence in hardware from a particular vendor and hardware platform. The distributed approach also has advantages from a functional point of view: acquisition resources become available to multiple instruments; the acquisition front-end can be physically remote from the rest of the instrument. High speed data acquisition system transmits data faster to a remote computer system through Ethernet interface. The data is acquired through 16 analog input channels. The input data commands are multiplexed and digitized and then the data is stored in 1K buffer for each input channel. The main control unit in this design is the 16 bit processor implemented in the FPGA. This 16 bit processor is used to set up and initialize the data source and the Ethernet controller, as well as control the flow of data from the memory element to the NIC. Using this processor we can initialize and control the different configuration registers in the Ethernet controller in a easy manner. Then these data packets are sending to the remote PC through the Ethernet interface. The main advantages of the using FPGA as standard platform are its flexibility, low power consumption, short design duration, fast time to market, programmability and high density. The main advantages of using Ethernet controller AX88796 over others are its non PCI interface, the presence of embedded SRAM where transmit and reception buffers are located and high-performance SRAM-like interface. The paper introduces the implementation of the distributed data acquisition using FPGA by VHDL. The main advantages of this system are high accuracy, high speed, real time monitoring.
Imaging initial formation processes of nanobubbles at the graphite-water interface through high-speed atomic force microscopy

NASA Astrophysics Data System (ADS)

Liao, Hsien-Shun; Yang, Chih-Wen; Ko, Hsien-Chen; Hwu, En-Te; Hwang, Ing-Shouh

2018-03-01

The initial formation process of nanobubbles at solid-water interfaces remains unclear because of the limitations of current imaging techniques. To directly observe the formation process, an astigmatic high-speed atomic force microscope (AFM) was modified to enable imaging in the liquid environment. By using a customized cantilever holder, the resonance of small cantilevers was effectively enhanced in water. The proposed high-speed imaging technique yielded highly dynamic quasi-two-dimensional (2D) gas structures (thickness: 20-30 nm) initially at the graphite-water interface. The 2D structures were laterally mobile mainly within certain areas, but occasionally a gas structure might extensively migrate and settle in a new area. The 2D structures were often confined by substrate step edges in one lateral dimension. Eventually, all quasi-2D gas structures were transformed into cap-shaped nanobubbles of higher heights and reduced lateral dimensions. These nanobubbles were immobile and remained stable under continuous AFM imaging. This study demonstrated that nanobubbles could be stably imaged at a scan rate of 100 lines per second (640 μm/s).
Imaging photomultiplier array with integrated amplifiers and high-speed USB interface.

PubMed

Blacksell, M; Wach, J; Anderson, D; Howard, J; Collis, S M; Blackwell, B D; Andruczyk, D; James, B W

2008-10-01

Multianode photomultiplier tube (PMT) arrays are finding application as convenient high-speed light sensitive devices for plasma imaging. This paper describes the development of a USB-based "plug-n-play" 16-channel PMT camera with 16 bits simultaneous acquisition of 16 signal channels at rates up to 2 MSs per channel. The preamplifiers and digital hardware are packaged in a compact housing which incorporates magnetic shielding, on-board generation of the high-voltage PMT bias, an optical filter mount and slits, and F-mount lens adaptor. Triggering, timing, and acquisition are handled by four field-programmable gate arrays (FPGAs) under instruction from a master FPGA controlled by a computer with a LABVIEW interface. We present technical design details and specifications and illustrate performance with high-speed images obtained on the H-1 heliac at the ANU.
Self-calibrated correlation imaging with k-space variant correlation functions.

PubMed

Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J

2018-03-01

Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Optimization of wheel-rail interface friction using top-of-rail friction modifiers: State of the art

NASA Astrophysics Data System (ADS)

Khan, M. Roshan; Dasaka, Satyanarayana Murty

2018-05-01

High Speed Railways and Dedicated Freight Corridors are the need of the day for fast and efficient transportation of the ever growing population and freight across long distances of travel. With the increase in speeds and axle loads carried by these trains, wearing out of rails and train wheel sections are a common issue, which is due to the increase in friction at the wheel-rail interfaces. For the cases where the wheel-rail interface friction is less than optimum, as in case of high speed trains with very low axle loads, wheel-slips are imminent and loss of traction occurs when the trains accelerate rapidly or brake all of a sudden. These vast variety of traction problems around the wheel-rail interface friction need to be mitigated carefully, so that the contact interface friction neither ascents too high to cause material wear and need for added locomotive power, nor be on the lower side to cause wheel-slips and loss of traction at high speeds. Top-of-rail friction modifiers are engineered surface coatings applied on top of rails, to maintain an optimum frictional contact between the train wheels and the rails. Extensive research works in the area of wheel-rail tribology have revealed that the optimum frictional coefficients at wheel-rail interfaces lie at a value of around 0.35. Application of top-of-rail (TOR) friction modifiers on rail surfaces add an extra layer of material coating on top of the rails, with a surface frictional coefficient of the desired range. This study reviews the common types of rail friction modifiers, the methods for their application, issues related with the application of friction modifiers, and a guideline on selection of the right class of coating material based on site specific requirements of the railway networks.
Double lead spiral platen parallel jaw end effector

NASA Technical Reports Server (NTRS)

Beals, David C.

1989-01-01

The double lead spiral platen parallel jaw end effector is an extremely powerful, compact, and highly controllable end effector that represents a significant improvement in gripping force and efficiency over the LaRC Puma (LP) end effector. The spiral end effector is very simple in its design and has relatively few parts. The jaw openings are highly predictable and linear, making it an ideal candidate for remote control. The finger speed is within acceptable working limits and can be modified to meet the user needs; for instance, greater finger speed could be obtained by increasing the pitch of the spiral. The force relaxation is comparable to the other tested units. Optimization of the end effector design would involve a compromise of force and speed for a given application.
Speed and accuracy improvements in FLAASH atmospheric correction of hyperspectral imagery

NASA Astrophysics Data System (ADS)

Perkins, Timothy; Adler-Golden, Steven; Matthew, Michael W.; Berk, Alexander; Bernstein, Lawrence S.; Lee, Jamine; Fox, Marsha

2012-11-01

Remotely sensed spectral imagery of the earth's surface can be used to fullest advantage when the influence of the atmosphere has been removed and the measurements are reduced to units of reflectance. Here, we provide a comprehensive summary of the latest version of the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes atmospheric correction algorithm. We also report some new code improvements for speed and accuracy. These include the re-working of the original algorithm in C-language code parallelized with message passing interface and containing a new radiative transfer look-up table option, which replaces executions of the MODTRAN model. With computation times now as low as ~10 s per image per computer processor, automated, real-time, on-board atmospheric correction of hyper- and multi-spectral imagery is within reach.

Large-eddy simulation/Reynolds-averaged Navier-Stokes hybrid schemes for high speed flows

NASA Astrophysics Data System (ADS)

Xiao, Xudong

Three LES/RANS hybrid schemes have been proposed for the prediction of high speed separated flows. Each method couples the k-zeta (Enstrophy) BANS model with an LES subgrid scale one-equation model by using a blending function that is coordinate system independent. Two of these functions are based on turbulence dissipation length scale and grid size, while the third one has no explicit dependence on the grid. To implement the LES/RANS hybrid schemes, a new rescaling-reintroducing method is used to generate time-dependent turbulent inflow conditions. The hybrid schemes have been tested on a Mach 2.88 flow over 25 degree compression-expansion ramp and a Mach 2.79 flow over 20 degree compression ramp. A special computation procedure has been designed to prevent the separation zone from expanding upstream to the recycle-plane. The code is parallelized using Message Passing Interface (MPI) and is optimized for running on IBM-SP3 parallel machine. The scheme was validated first for a flat plate. It was shown that the blending function has to be monotonic to prevent the RANS region from appearing in the LES region. In the 25 deg ramp case, the hybrid schemes provided better agreement with experiment in the recovery region. Grid refinement studies demonstrated the importance of using a grid independent blend function and further improvement with experiment in the recovery region. In the 20 deg ramp case, with a relatively finer grid, the hybrid scheme characterized by grid independent blending function well predicted the flow field in both the separation region and the recovery region. Therefore, with "appropriately" fine grid, current hybrid schemes are promising for the simulation of shock wave/boundary layer interaction problems.
Speed of fast and slow rupture fronts along frictional interfaces

NASA Astrophysics Data System (ADS)

Trømborg, Jørgen Kjoshagen; Sveinsson, Henrik Andersen; Thøgersen, Kjetil; Scheibert, Julien; Malthe-Sørenssen, Anders

2015-07-01

The transition from stick to slip at a dry frictional interface occurs through the breaking of microjunctions between the two contacting surfaces. Typically, interactions between junctions through the bulk lead to rupture fronts propagating from weak and/or highly stressed regions, whose junctions break first. Experiments find rupture fronts ranging from quasistatic fronts, via fronts much slower than elastic wave speeds, to fronts faster than the shear wave speed. The mechanisms behind and selection between these fronts are still imperfectly understood. Here we perform simulations in an elastic two-dimensional spring-block model where the frictional interaction between each interfacial block and the substrate arises from a set of junctions modeled explicitly. We find that material slip speed and rupture front speed are proportional across the full range of front speeds we observe. We revisit a mechanism for slow slip in the model and demonstrate that fast slip and fast fronts have a different, inertial origin. We highlight the long transients in front speed even along homogeneous interfaces, and we study how both the local shear to normal stress ratio and the local strength are involved in the selection of front type and front speed. Last, we introduce an experimentally accessible integrated measure of block slip history, the Gini coefficient, and demonstrate that in the model it is a good predictor of the history-dependent local static friction coefficient of the interface. These results will contribute both to building a physically based classification of the various types of fronts and to identifying the important mechanisms involved in the selection of their propagation speed.
Creating, Storing, and Dumping Low and High Resolution Graphics on the Apple IIe Microcomputer System.

ERIC Educational Resources Information Center

Fletcher, Richard K., Jr.

This description of procedures for dumping high and low resolution graphics using the Apple IIe microcomputer system focuses on two special hardware configurations that are commonly used in schools--the Apple Dot Matrix Printer with the Apple Parallel Interface Card, and the Imagewriter Printer with the Apple Super Serial Interface Card. Special…
A 1024×768-12μm Digital ROIC for uncooled microbolometer FPAs

NASA Astrophysics Data System (ADS)

Eminoglu, Selim

2017-02-01

This paper reports the development of a new digital microbolometer Readout Integrated Circuit (D-ROIC), called MT10212BD. It has a format of 1024 × 768 (XGA) and a pixel pitch of 12μm. MT10212BD is Mikro Tasarim's second 12μm pitch microbolometer ROIC, which is developed specifically for surface micro machined microbolometer detector arrays with small pixel pitch using high-TCR pixel materials, such as VOx and a Si. MT10212BD has an alldigital system on-chip architecture, which generates programmable timing and biasing, and performs 14-bit analog to digital conversion (ADC). The signal processing chain in the ROIC is composed of pixel bias circuitry, integrator based programmable gain amplifier followed by column parallel ADC circuitry. MT10212BD has a serial programming interface that can be used to configure the programmable ROIC features and to load the Non-Uniformity-Correction (NUC) date to the ROIC. MT10212BD has a total of 8 high-speed serial digital video outputs, which can be programmed to operate in the 2, 4, and 8-output modes and can support frames rates above 60 fps. The high-speed serial digital outputs supports data rates as high as 400 Mega-bits/s, when operated at 50 MHz system clock frequency. There is an on-chip phase-locked-loop (PLL) based timing circuitry to generate the high speed clocks used in the ROIC. The ROIC is designed to support pixel resistance values ranging from 30KΩ to 90kΩ, with a nominal value of 60KΩ. The ROIC has a globally programmable gain in the column readout, which can be adjusted based on the detector resistance value.
Dental preparation with sonic vs high-speed finishing: analysis of microleakage in bonded veneer restorations.

PubMed

Faus-Matoses, Ignacio; Solá-Ruiz, Fernanda

2014-02-01

To compare marginal microleakage in porcelain veneer restorations following dental finishing using two types of instruments to test the hypothesis that microleakage will be less when teeth are prepared with sonic oscillating burs than when prepared with high-speed rotating burs. Fifty-six extracted human maxillary central incisors were selected and divided randomly into two groups. Group 1 samples underwent dental finishing using high-speed rotating diamond burs, while group 2 used sonic oscillating diamond burs. Buccal chamfer preparation was carried out for both groups. Forty eight of the samples (24 per group) were restored using IPS Empress ceramic veneers. 2% methylene blue was used to evaluate microleakage at the tooth/composite veneer interface. Teeth were sectioned lengthwise into three parts and microleakage was measured at two points - cervical and incisal - on each section. Before bonding, four teeth per group underwent SEM examination. Evaluation of microleakage at the cervical dentin margin showed a value of 10.5% in group 1 and 6.6% in group 2, which was statistically significantly different (p < 0.05). Incisal microleakage was 1.3% for group 1 and 1.2% for group 2, which was not significantly different. SEM revealed different patterns of surface texture in both areas according to the instrument used. Group 1 exhibited parallel horizontal abrasion grooves with a milled effect and thick smear layers; group 2 showed abrasive erosion, discontinuous perpendicular depressions, and thin smear layers. Tooth preparations finished with sonic burs produced significantly less microleakage in the cervical dentin area of bonded veneer restorations. No differences were found in the incisal enamel area.
A high-throughput solid-phase extraction microchip combined with inductively coupled plasma-mass spectrometry for rapid determination of trace heavy metals in natural water.

PubMed

Shih, Tsung-Ting; Hsieh, Cheng-Chuan; Luo, Yu-Ting; Su, Yi-An; Chen, Ping-Hung; Chuang, Yu-Chen; Sun, Yuh-Chang

2016-04-15

Herein, a hyphenated system combining a high-throughput solid-phase extraction (htSPE) microchip with inductively coupled plasma-mass spectrometry (ICP-MS) for rapid determination of trace heavy metals was developed. Rather than performing multiple analyses in parallel for the enhancement of analytical throughput, we improved the processing speed for individual samples by increasing the operation flow rate during SPE procedures. To this end, an innovative device combining a micromixer and a multi-channeled extraction unit was designed. Furthermore, a programmable valve manifold was used to interface the developed microchip and ICP-MS instrumentation in order to fully automate the system, leading to a dramatic reduction in operation time and human error. Under the optimized operation conditions for the established system, detection limits of 1.64-42.54 ng L(-1) for the analyte ions were achieved. Validation procedures demonstrated that the developed method could be satisfactorily applied to the determination of trace heavy metals in natural water. Each analysis could be readily accomplished within just 186 s using the established system. This represents, to the best of our knowledge, an unprecedented speed for the analysis of trace heavy metal ions. Copyright © 2016 Elsevier B.V. All rights reserved.
The high speed interconnect system architecture and operation

NASA Astrophysics Data System (ADS)

Anderson, Steven C.

The design and operation of a fiber-optic high-speed interconnect system (HSIS) being developed to meet the requirements of future avionics and flight-control hardware with distributed-system architectures are discussed. The HSIS is intended for 100-Mb/s operation of a local-area network with up to 256 stations. It comprises a bus transmission system (passive star couplers and linear media linked by active elements) and network interface units (NIUs). Each NIU is designed to perform the physical, data link, network, and transport functions defined by the ISO OSI Basic Reference Model (1982 and 1983) and incorporates a fiber-optic transceiver, a high-speed protocol based on the SAE AE-9B linear token-passing data bus (1986), and a specialized application interface unit. The operating modes and capabilities of HSIS are described in detail and illustrated with diagrams.
Experimental Studies of the Interaction Between a Parallel Shear Flow and a Directionally-Solidifying Front

NASA Technical Reports Server (NTRS)

Zhang, Meng; Maxworthy, Tony

1999-01-01

It has long been recognized that flow in the melt can have a profound influence on the dynamics of a solidifying interface and hence the quality of the solid material. In particular, flow affects the heat and mass transfer, and causes spatial and temporal variations in the flow and melt composition. This results in a crystal with nonuniform physical properties. Flow can be generated by buoyancy, expansion or contraction upon phase change, and thermo-soluto capillary effects. In general, these flows can not be avoided and can have an adverse effect on the stability of the crystal structures. This motivates crystal growth experiments in a microgravity environment, where buoyancy-driven convection is significantly suppressed. However, transient accelerations (g-jitter) caused by the acceleration of the spacecraft can affect the melt, while convection generated from the effects other than buoyancy remain important. Rather than bemoan the presence of convection as a source of interfacial instability, Hurle in the 1960s suggested that flow in the melt, either forced or natural convection, might be used to stabilize the interface. Delves considered the imposition of both a parabolic velocity profile and a Blasius boundary layer flow over the interface. He concluded that fast stirring could stabilize the interface to perturbations whose wave vector is in the direction of the fluid velocity. Forth and Wheeler considered the effect of the asymptotic suction boundary layer profile. They showed that the effect of the shear flow was to generate travelling waves parallel to the flow with a speed proportional to the Reynolds number. There have been few quantitative, experimental works reporting on the coupling effect of fluid flow and morphological instabilities. Huang studied plane Couette flow over cells and dendrites. It was found that this flow could greatly enhance the planar stability and even induce the cell-planar transition. A rotating impeller was buried inside the sample cell, driven by an outside rotating magnet, in order to generate the flow. However, it appears that this was not a well-controlled flow and may also have been unsteady. In the present experimental study, we want to study how a forced parallel shear flow in a Hele-Shaw cell interacts with the directionally solidifying crystal interface. The comparison of experimental data show that the parallel shear flow in a Hele-Shaw cell has a strong stabilizing effect on the planar interface by damping the existing initial perturbations. The flow also shows a stabilizing effect on the cellular interface by slightly reducing the exponential growth rate of cells. The left-right symmetry of cells is broken by the flow with cells tilting toward the incoming flow direction. The tilting angle increases with the velocity ratio. The experimental results are explained through the parallel flow effect on lateral solute transport. The phenomenon of cells tilting against the flow is consistent with the numerical result of Dantzig and Chao.
High-speed prediction of crystal structures for organic molecules

NASA Astrophysics Data System (ADS)

Obata, Shigeaki; Goto, Hitoshi

2015-02-01

We developed a master-worker type parallel algorithm for allocating tasks of crystal structure optimizations to distributed compute nodes, in order to improve a performance of simulations for crystal structure predictions. The performance experiments were demonstrated on TUT-ADSIM supercomputer system (HITACHI HA8000-tc/HT210). The experimental results show that our parallel algorithm could achieve speed-ups of 214 and 179 times using 256 processor cores on crystal structure optimizations in predictions of crystal structures for 3-aza-bicyclo(3.3.1)nonane-2,4-dione and 2-diazo-3,5-cyclohexadiene-1-one, respectively. We expect that this parallel algorithm is always possible to reduce computational costs of any crystal structure predictions.
Concurrent Collections (CnC): A new approach to parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Knobe, Kathleen

2010-05-07

A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicitymore » or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer or program) who demands maximal flexibility for optimizing the mapping to a wide range of target platforms (parallel / serial, shared / distributed, homogeneous / heterogeneous, etc.) but needs no knowledge of the domain. Concurrent Collections (CnC) is based on this separation of concerns. The talk will present CnC and its benefits. About the speaker. Kathleen Knobe has focused throughout her career on parallelism especially compiler technology, runtime system design and language design. She worked at Compass (aka Massachusetts Computer Associates) from 1980 to 1991 designing compilers for a wide range of parallel platforms for Thinking Machines, MasPar, Alliant, Numerix, and several government projects. In 1991 she decided to finish her education. After graduating from MIT in 1997, she joined Digital Equipment’s Cambridge Research Lab (CRL). She stayed through the DEC/Compaq/HP mergers and when CRL was acquired and absorbed by Intel. She currently works in the Software and Services Group / Technology Pathfinding and Innovation.« less
Concurrent Collections (CnC): A new approach to parallel programming

ScienceCinema

Knobe, Kathleen

2018-04-16

A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer or program) who demands maximal flexibility for optimizing the mapping to a wide range of target platforms (parallel / serial, shared / distributed, homogeneous / heterogeneous, etc.) but needs no knowledge of the domain. Concurrent Collections (CnC) is based on this separation of concerns. The talk will present CnC and its benefits. About the speaker. Kathleen Knobe has focused throughout her career on parallelism especially compiler technology, runtime system design and language design. She worked at Compass (aka Massachusetts Computer Associates) from 1980 to 1991 designing compilers for a wide range of parallel platforms for Thinking Machines, MasPar, Alliant, Numerix, and several government projects. In 1991 she decided to finish her education. After graduating from MIT in 1997, she joined Digital Equipmentâs Cambridge Research Lab (CRL). She stayed through the DEC/Compaq/HP mergers and when CRL was acquired and absorbed by Intel. She currently works in the Software and Services Group / Technology Pathfinding and Innovation.
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology

NASA Astrophysics Data System (ADS)

Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.

2009-04-01

Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Parallelization of the TRIGRS model for rainfall-induced landslides using the message passing interface

USGS Publications Warehouse

Alvioli, M.; Baum, R.L.

2016-01-01

We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
Development of Low-Cost Microcontroller-Based Interface for Data Acquisition and Control of Microbioreactor Operation.

PubMed

Husain, Abdul Rashid; Hadad, Yaser; Zainal Alam, Muhd Nazrul Hisham

2016-10-01

This article presents the development of a low-cost microcontroller-based interface for a microbioreactor operation. An Arduino MEGA 2560 board with 54 digital input/outputs, including 15 pulse-width-modulation outputs, has been chosen to perform the acquisition and control of the microbioreactor. The microbioreactor (volume = 800 µL) was made of poly(dimethylsiloxane) and poly(methylmethacrylate) polymers. The reactor was built to be equipped with sensors and actuators for the control of reactor temperature and the mixing speed. The article discusses the circuit of the microcontroller-based platform, describes the signal conditioning steps, and evaluates the capacity of the proposed low-cost microcontroller-based interface in terms of control accuracy and system responses. It is demonstrated that the proposed microcontroller-based platform is able to operate parallel microbioreactor operation with satisfactory performances. Control accuracy at a deviation less than 5% of the set-point values and responses in the range of few seconds have been recorded. © 2015 Society for Laboratory Automation and Screening.
Overview of ICE Project: Integration of Computational Fluid Dynamics and Experiments

NASA Technical Reports Server (NTRS)

Stegeman, James D.; Blech, Richard A.; Babrauckas, Theresa L.; Jones, William H.

2001-01-01

Researchers at the NASA Glenn Research Center have developed a prototype integrated environment for interactively exploring, analyzing, and validating information from computational fluid dynamics (CFD) computations and experiments. The Integrated CFD and Experiments (ICE) project is a first attempt at providing a researcher with a common user interface for control, manipulation, analysis, and data storage for both experiments and simulation. ICE can be used as a live, on-tine system that displays and archives data as they are gathered; as a postprocessing system for dataset manipulation and analysis; and as a control interface or "steering mechanism" for simulation codes while visualizing the results. Although the full capabilities of ICE have not been completely demonstrated, this report documents the current system. Various applications of ICE are discussed: a low-speed compressor, a supersonic inlet, real-time data visualization, and a parallel-processing simulation code interface. A detailed data model for the compressor application is included in the appendix.
Biwavelength transceiver module for parallel simultaneous bidirectional optical interconnections

NASA Astrophysics Data System (ADS)

Nguyen, Nga T. H.; Ukaegbu, Ikechi A.; Sangirov, Jamshid; Cho, Mu-Hee; Lee, Tae-Woo; Park, Hyo-Hoon

2013-12-01

The design of a biwavelength transceiver (TRx) module for parallel simultaneous bidirectional optical interconnects is described. The TRx module has been implemented using two different wavelengths, 850 and 1060 nm, to send and receive signals simultaneously through a common optical interface while optimizing cost and performance. Filtering mirrors are formed in the optical fibers which are embedded on a V-grooved silicon substrate for reflecting and filtering optical signals from/to vertical-cavity surface-emitting laser (VCSEL)/photodiode (PD). The VCSEL and PD are flip-chip bonded on individual silicon optical benches, which are attached on the silicon substrate for optical signal coupling from the VCSEL to fiber and from fiber to the PD. A high-speed and low-loss ceramic printed circuit board, which has a compact size of 0.033 cc, has been designed to carry transmitter and receiver chips for easy packaging of the TRx module. Applied for quad small form-factor pluggable applications at 40-Gbps operation, the four-channel biwavelength TRx module showed clear eye diagrams with a bit error rate (BER) of 10-12 at input powers of -5 and -5.8 dBm for 1060 and 850 nm operation modes, respectively.
Stability analysis applied to the early stages of viscous drop breakup by a high-speed gas stream

NASA Astrophysics Data System (ADS)

Padrino, Juan C.; Longmire, Ellen K.

2013-11-01

The instability of a liquid drop suddenly exposed to a high-speed gas stream behind a shock wave is studied by considering the gas-liquid motion at the drop interface. The discontinuous velocity profile given by the uniform, parallel flow of an inviscid, compressible gas over a viscous liquid is considered, and drop acceleration is included. Our analysis considers compressibility effects not only in the base flow, but also in the equations of motion for the perturbations. Recently published high-resolution images of the process of drop breakup by a passing shock have provided experimental evidence supporting the idea that a critical gas dynamic pressure can be found above which drop piercing by the growth of acceleration-driven instabilities gives way to drop breakup by liquid entrainment resulting from the gas shearing action. For a set of experimental runs from the literature, results show that, for shock Mach numbers >= 2, a band of rapidly growing waves forms in the region well upstream of the drop's equator at the location where the base flow passes from subsonic to supersonic, in agreement with experimental images. Also, the maximum growth rate can be used to predict the transition of the breakup mode from Rayleigh-Taylor piercing to shear-induced entrainment. The authors acknowledge support of the NSF (DMS-0908561).
Implementation of total focusing method for phased array ultrasonic imaging on FPGA

NASA Astrophysics Data System (ADS)

Guo, JianQiang; Li, Xi; Gao, Xiaorong; Wang, Zeyong; Zhao, Quanke

2015-02-01

This paper describes a multi-FPGA imaging system dedicated for the real-time imaging using the Total Focusing Method (TFM) and Full Matrix Capture (FMC). The system was entirely described using Verilog HDL language and implemented on Altera Stratix IV GX FPGA development board. The whole algorithm process is to: establish a coordinate system of image and divide it into grids; calculate the complete acoustic distance of array element between transmitting array element and receiving array element, and transform it into index value; then index the sound pressure values from ROM and superimpose sound pressure values to get pixel value of one focus point; and calculate the pixel values of all focus points to get the final imaging. The imaging result shows that this algorithm has high SNR of defect imaging. And FPGA with parallel processing capability can provide high speed performance, so this system can provide the imaging interface, with complete function and good performance.
Three-dimensional laser microvision.

PubMed

Shimotahira, H; Iizuka, K; Chu, S C; Wah, C; Costen, F; Yoshikuni, Y

2001-04-10

A three-dimensional (3-D) optical imaging system offering high resolution in all three dimensions, requiring minimum manipulation and capable of real-time operation, is presented. The system derives its capabilities from use of the superstructure grating laser source in the implementation of a laser step frequency radar for depth information acquisition. A synthetic aperture radar technique was also used to further enhance its lateral resolution as well as extend the depth of focus. High-speed operation was made possible by a dual computer system consisting of a host and a remote microcomputer supported by a dual-channel Small Computer System Interface parallel data transfer system. The system is capable of operating near real time. The 3-D display of a tunneling diode, a microwave integrated circuit, and a see-through image taken by the system operating near real time are included. The depth resolution is 40 mum; lateral resolution with a synthetic aperture approach is a fraction of a micrometer and that without it is approximately 10 mum.
The wave-based substructuring approach for the efficient description of interface dynamics in substructuring

NASA Astrophysics Data System (ADS)

Donders, S.; Pluymers, B.; Ragnarsson, P.; Hadjit, R.; Desmet, W.

2010-04-01

In the vehicle design process, design decisions are more and more based on virtual prototypes. Due to competitive and regulatory pressure, vehicle manufacturers are forced to improve product quality, to reduce time-to-market and to launch an increasing number of design variants on the global market. To speed up the design iteration process, substructuring and component mode synthesis (CMS) methods are commonly used, involving the analysis of substructure models and the synthesis of the substructure analysis results. Substructuring and CMS enable efficient decentralized collaboration across departments and allow to benefit from the availability of parallel computing environments. However, traditional CMS methods become prohibitively inefficient when substructures are coupled along large interfaces, i.e. with a large number of degrees of freedom (DOFs) at the interface between substructures. The reason is that the analysis of substructures involves the calculation of a number of enrichment vectors, one for each interface degree of freedom (DOF). Since large interfaces are common in vehicles (e.g. the continuous line connections to connect the body with the windshield, roof or floor), this interface bottleneck poses a clear limitation in the vehicle noise, vibration and harshness (NVH) design process. Therefore there is a need to describe the interface dynamics more efficiently. This paper presents a wave-based substructuring (WBS) approach, which allows reducing the interface representation between substructures in an assembly by expressing the interface DOFs in terms of a limited set of basis functions ("waves"). As the number of basis functions can be much lower than the number of interface DOFs, this greatly facilitates the substructure analysis procedure and results in faster design predictions. The waves are calculated once from a full nominal assembly analysis, but these nominal waves can be re-used for the assembly of modified components. The WBS approach thus enables efficient structural modification predictions of the global modes, so that efficient vibro-acoustic design modification, optimization and robust design become possible. The results show that wave-based substructuring offers a clear benefit for vehicle design modifications, by improving both the speed of component reduction processes and the efficiency and accuracy of design iteration predictions, as compared to conventional substructuring approaches.

Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B.; Perumalla, Kalyan S.

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

DOE PAGES

Yoginath, Srikanth B.; Perumalla, Kalyan S.

2018-01-31

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Fast neural net simulation with a DSP processor array.

PubMed

Muller, U A; Gunzinger, A; Guggenbuhl, W

1995-01-01

This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Identifying Read/Write Speeds for Field-Induced Interfacial Resistive Switching.

NASA Astrophysics Data System (ADS)

Tsui, Stephen; Das, Nilanjan; Wang, Yaqi; Xue, Yuyi; Chu, C. W.

2007-03-01

Efforts continue to explore new phenomena that may allow for next generation nonvolatile memory technology. Much attention has been drawn to the field-induced resistive switch occurring at the interface between a metal electrode and perovskite oxide. The switch between high (off) and low (on) resistance states is controlled by the polarity of applied voltage pulsing. Characterization of Ag-Pr0.7Ca0.3MnO3 interfaces via impedance spectroscopy shows that the resistances above 10^6 Hz are the same at the on and off states, which limits the reading speed to far slower than the applied switching pulses, or device write speed at the order of 10^7 Hz. We deduce that the switching interface is percolative in nature and that small local rearrangement of defect structures may play a major role.
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
The development speed paradox: can increasing development speed reduce R&D productivity?

PubMed

Lendrem, Dennis W; Lendrem, B Clare

2014-03-01

In the 1990s the pharmaceutical industry sought to increase R&D productivity by shifting development tasks into parallel to reduce development cycle times and increase development speed. This paper presents a simple model demonstrating that, when attrition rates are high as in pharmaceutical development, such development speed initiatives can increase the expected time for the first successful molecule to complete development. Increasing the development speed of successful molecules could actually reduce R&D productivity - the development speed paradox. Copyright © 2013 Elsevier Ltd. All rights reserved.
A simulation-based study of HighSpeed TCP and its deployment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souza, Evandro de

2003-05-01

The current congestion control mechanism used in TCP has difficulty reaching full utilization on high speed links, particularly on wide-area connections. For example, the packet drop rate needed to fill a Gigabit pipe using the present TCP protocol is below the currently achievable fiber optic error rates. HighSpeed TCP was recently proposed as a modification of TCP's congestion control mechanism to allow it to achieve reasonable performance in high speed wide-area links. In this research, simulation results showing the performance of HighSpeed TCP and the impact of its use on the present implementation of TCP are presented. Network conditions includingmore » different degrees of congestion, different levels of loss rate, different degrees of bursty traffic and two distinct router queue management policies were simulated. The performance and fairness of HighSpeed TCP were compared to the existing TCP and solutions for bulk-data transfer using parallel streams.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
Parallel-Connected Photovoltaic Inverters: Zero Frequency Sequence Harmonic Analysis and Solution

NASA Astrophysics Data System (ADS)

Carmeli, Maria Stefania; Mauri, Marco; Frosio, Luisa; Bezzolato, Alberto; Marchegiani, Gabriele

2013-05-01

High-power photovoltaic (PV) plants are usually constituted of the connection of different PV subfields, each of them with its interface transformer. Different solutions have been studied to improve the efficiency of the whole generation system. In particular, transformerless configurations are the more attractive one from efficiency and costs point of view. This paper focuses on transformerless PV configurations characterised by the parallel connection of interface inverters. The problem of zero sequence current due to both the parallel connection and the presence of undesirable parasitic earth capacitances is considered and a solution, which consists of the synchronisation of pulse-width modulation triangular carrier, is proposed and theoretically analysed. The theoretical analysis has been validated through simulation and experimental results.
SEAL FOR HIGH SPEED CENTRIFUGE

DOEpatents

Skarstrom, C.W.

1957-12-17

A seal is described for a high speed centrifuge wherein the centrifugal force of rotation acts on the gasket to form a tight seal. The cylindrical rotating bowl of the centrifuge contains a closure member resting on a shoulder in the bowl wall having a lower surface containing bands of gasket material, parallel and adjacent to the cylinder wall. As the centrifuge speed increases, centrifugal force acts on the bands of gasket material forcing them in to a sealing contact against the cylinder wall. This arrangememt forms a simple and effective seal for high speed centrifuges, replacing more costly methods such as welding a closure in place.
High speed infrared imaging system and method

DOEpatents

Zehnder, Alan T.; Rosakis, Ares J.; Ravichandran, G.

2001-01-01

A system and method for radiation detection with an increased frame rate. A semi-parallel processing configuration is used to process a row or column of pixels in a focal-plane array in parallel to achieve a processing rate up to and greater than 1 million frames per second.
Probing and Manipulating the Interfacial Defects of InGaAs Dual-Layer Metal Oxides at the Atomic Scale.

PubMed

Wu, Xing; Luo, Chen; Hao, Peng; Sun, Tao; Wang, Runsheng; Wang, Chaolun; Hu, Zhigao; Li, Yawei; Zhang, Jian; Bersuker, Gennadi; Sun, Litao; Pey, Kinleong

2018-01-01

The interface between III-V and metal-oxide-semiconductor materials plays a central role in the operation of high-speed electronic devices, such as transistors and light-emitting diodes. The high-speed property gives the light-emitting diodes a high response speed and low dark current, and they are widely used in communications, infrared remote sensing, optical detection, and other fields. The rational design of high-performance devices requires a detailed understanding of the electronic structure at this interface; however, this understanding remains a challenge, given the complex nature of surface interactions and the dynamic relationship between the morphology evolution and electronic structures. Herein, in situ transmission electron microscopy is used to probe and manipulate the structural and electrical properties of ZrO 2 films on Al 2 O 3 and InGaAs substrate at the atomic scale. Interfacial defects resulting from the spillover of the oxygen-atom conduction-band wavefunctions are resolved. This study unearths the fundamental defect-driven interfacial electric structure of III-V semiconductor materials and paves the way to future high-speed and high-reliability devices. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The evolution of slip pulses within bimaterial interfaces with rupture velocity

NASA Astrophysics Data System (ADS)

Shlomai, H.; Fineberg, J.

2017-12-01

The most general frictional motion in nature involves bimaterial interfaces, when contacting bodies possess different elastic properties. Frictional motion occurs when the contacts composing the interface separating these bodies detach via propagating rupture fronts. Coupling between slip and normal stress variations is unique to bimaterial interfaces. Here we use high speed simultaneous measurements of slip velocities, real contact area and stresses to explicitly reveal this bimaterial coupling and its role in determining different classes of rupture modes and their structures. Our experiments study the rupture of a spatially extended interface formed by brittle plastics whose shear wave speeds differ by 30%. Any slip within a bimaterial interface will break the stress symmetry across the interface. One important result of this is that local values of normal stress variations at the interface couple to interface slip, `bimaterial coupling'. The sign of the coupling depends on the front propagation direction. When we consider ruptures propagating in the direction of motion of the more compliant material, the `positive' direction, slip reduces the normal stress. We focus on this direction. We show that, in this direction, interface ruptures develop from crack-like behavior at low rupture velocities, whose structure corresponds to theoretical predictions: As the ruptures accelerate towards their asymptotic speed, the structures of the strain and stress fields near the rupture tip deviate significantly from this crack-like form, and systematically sharpen to a pulse-like rupture mode called slip-pulses. We conclude with a description of slip-pulse properties.
Conceptual design and kinematic analysis of a novel parallel robot for high-speed pick-and-place operations

NASA Astrophysics Data System (ADS)

Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun

2018-06-01

This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.
A Study Of High Speed Friction Behavior Under Elastic Loading Conditions

NASA Astrophysics Data System (ADS)

Crawford, P. J.; Hammerberg, J. E.

2005-03-01

The role of interfacial dynamics under high strain-rate conditions is an important constitutive relationship in modern modeling and simulation studies of dynamic events (<100 μs in length). The frictional behavior occurring at the interface between two metal surfaces under high elastic loading and sliding speed conditions is studied using the Rotating Barrel Gas Gun (RBGG) facility. The RBGG utilizes a low-pressure gas gun to propel a rotating annular projectile towards an annular target rod. Upon striking the target, the projectile imparts both an axial and a torsional impulse into the target. Resulting elastic waves are measured using strain gauges attached to the target rod. The kinetic coefficient of friction is obtained through an analysis of the resulting strain wave data. Experiments performed using Cu/Cu, Cu/Stainless steel and Cu/Al interfaces provide some insight into the kinetic coefficient of friction behavior at varying sliding speeds and impact loads.
Mapping trace element distribution in fossil teeth and bone with LA-ICP-MS

NASA Astrophysics Data System (ADS)

Hinz, E. A.; Kohn, M. J.

2009-12-01

Trace element profiles were measured in fossil bones and teeth from the late Pleistocene (c. 25 ka) Merrell locality, Montana, USA, by using laser-ablation ICP-MS. Laser-ablation ICP-MS can collect element counts along predefined tracks on a sample’s surface using a constant ablation speed allowing for rapid spatial sampling of element distribution. Key elements analyzed included common divalent cations (e.g. Sr, Zn, Ba), a suite of REE (La, Ce, Nd, Sm, Eu, Yb), and U, in addition to Ca for composition normalization and standardization. In teeth, characteristic diffusion penetration distances for all trace elements are at least a factor of 4 greater in traverses parallel to the dentine-enamel interface (parallel to the growth axis of the tooth) than perpendicular to the interface. Multiple parallel traverses in sections parallel and perpendicular to the tooth growth axis were transformed into trace element maps, and illustrate greater uptake of all trace elements along the central axis of dentine compared to areas closer to enamel, or within the enamel itself. Traverses in bone extending from the external surface, through the thickness of cortical bone and several mm into trabecular bone show major differences in trace element uptake compared to teeth: U and Sr are homogeneous, whereas all REE show a kinked profile with high concentrations on outer surfaces that decrease by several orders of magnitude within a few mm inward. The Eu anomaly increases uniformly from the outer edge of bone inward, whereas the Ce anomaly decreases slightly. These observations point to major structural anisotropies in trace element transport and uptake during fossilization, yet transport and uptake of U and REE are not resolvably different. In contrast, transport and uptake of U in bone must proceed orders of magnitude faster than REE as U is homogeneous whereas REE exhibit strong gradients. The kinked REE profiles in bone unequivocally indicate differential transport rates, consistent with a double-medium diffusion model in which microdomains with slow diffusivities are bounded by fast-diffusing pathways.
Parallel fuzzy connected image segmentation on GPU

PubMed Central

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
Parallel fuzzy connected image segmentation on GPU.

PubMed

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

2011-07-01

Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
Analysis of high-speed growth of silicon sheet in inclined-meniscus configuration

NASA Technical Reports Server (NTRS)

Thomas, P. D.; Brown, R. A.

1985-01-01

The study of high speed growth of silicon sheet in inclined-meniscus configurations is discussed. It was concluded that the maximum growth rates in vertical and inclined growth are set by thermal-capillary limits. Also, the melt/crystal interface was determined to be flat. And, vertical growth is qualitatively modelled by one dimensional heat transfer.
Field trials of 100G and beyond: an operator's point of view

NASA Astrophysics Data System (ADS)

Vorbeck, S.; Schneiders, M.; Weiershausen, W.; Mayer, H.; Schippel, A.; Wagner, P.; Ehrhardt, A.; Braun, R.; Breuer, D.; Drafz, U.; Fritzsche, D.

2011-01-01

In this article we present a summary of the latest 100 Gbps field trials in the network of Deutsche Telekom AG with industry partners. We cover a brown field approach as alien wavelength on existing systems, a green field high speed overlay network approach and a high speed interface router-router coupling.

ASDF: An Adaptable Seismic Data Format with Full Provenance

NASA Astrophysics Data System (ADS)

Smith, J. A.; Krischer, L.; Tromp, J.; Lefebvre, M. P.

2015-12-01

In order for seismologists to maximize their knowledge of how the Earth works, they must extract the maximum amount of useful information from all recorded seismic data available for their research. This requires assimilating large sets of waveform data, keeping track of vast amounts of metadata, using validated standards for quality control, and automating the workflow in a careful and efficient manner. In addition, there is a growing gap between CPU/GPU speeds and disk access speeds that leads to an I/O bottleneck in seismic workflows. This is made even worse by existing seismic data formats that were not designed for performance and are limited to a few fixed headers for storing metadata.The Adaptable Seismic Data Format (ASDF) is a new data format for seismology that solves the problems with existing seismic data formats and integrates full provenance into the definition. ASDF is a self-describing format that features parallel I/O using the parallel HDF5 library. This makes it a great choice for use on HPC clusters. The format integrates the standards QuakeML for seismic sources and StationXML for receivers. ASDF is suitable for storing earthquake data sets, where all waveforms for a single earthquake are stored in a one file, ambient noise cross-correlations, and adjoint sources. The format comes with a user-friendly Python reader and writer that gives seismologists access to a full set of Python tools for seismology. There is also a faster C/Fortran library for integrating ASDF into performance-focused numerical wave solvers, such as SPECFEM3D_GLOBE. Finally, a GUI tool designed for visually exploring the format exists that provides a flexible interface for both research and educational applications. ASDF is a new seismic data format that offers seismologists high-performance parallel processing, organized and validated contents, and full provenance tracking for automated seismological workflows.
Dumping Low and High Resolution Graphics on the Apple IIe Microcomputer System.

ERIC Educational Resources Information Center

Fletcher, Richard K., Jr.; Ruckman, Frank, Jr.

This paper discusses and outlines procedures for obtaining a hard copy of the graphic output of a microcomputer or "dumping a graphic" using the Apple Dot Matrix Printer with the Apple Parallel Interface Card, and the Imagewriter Printer with the Apple Super Serial Interface Card. Hardware configurations and instructions for high…
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Active holographic interconnects for interfacing volume storage

NASA Astrophysics Data System (ADS)

Domash, Lawrence H.; Schwartz, Jay R.; Nelson, Arthur R.; Levin, Philip S.

1992-04-01

In order to achieve the promise of terabit/cm3 data storage capacity for volume holographic optical memory, two technological challenges must be met. Satisfactory storage materials must be developed and the input/output architectures able to match their capacity with corresponding data access rates must also be designed. To date the materials problem has received more attention than devices and architectures for access and addressing. Two philosophies of parallel data access to 3-D storage have been discussed. The bit-oriented approach, represented by recent work on two-photon memories, attempts to store bits at local sites within a volume without affecting neighboring bits. High speed acousto-optic or electro- optic scanners together with dynamically focused lenses not presently available would be required. The second philosophy is that volume optical storage is essentially holographic in nature, and that each data write or read is to be distributed throughout the material volume on the basis of angle multiplexing or other schemes consistent with the principles of holography. The requirements for free space optical interconnects for digital computers and fiber optic network switching interfaces are also closely related to this class of devices. Interconnects, beamlet generators, angle multiplexers, scanners, fiber optic switches, and dynamic lenses are all devices which may be implemented by holographic or microdiffractive devices of various kinds, which we shall refer to collectively as holographic interconnect devices. At present, holographic interconnect devices are either fixed holograms or spatial light modulators. Optically or computer generated holograms (submicron resolution, 2-D or 3-D, encoding 1013 bits, nearly 100 diffraction efficiency) can implement sophisticated mathematical design principles, but of course once fabricated they cannot be changed. Spatial light modulators offer high speed programmability but have limited resolution (512 X 512 pixels, encoding about 106 bits of data) and limited diffraction efficiency. For any application, one must choose between high diffractive performance and programmability.
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

NASA Astrophysics Data System (ADS)

Macioł, Piotr; Michalik, Kazimierz

2016-10-01

Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

PubMed

Furuta, Takuya; Sato, Tatsuhiko

2015-01-01

Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
High-Speed Edge-Detecting Line Scan Smart Camera

NASA Technical Reports Server (NTRS)

Prokop, Norman F.

2012-01-01

A high-speed edge-detecting line scan smart camera was developed. The camera is designed to operate as a component in a NASA Glenn Research Center developed inlet shock detection system. The inlet shock is detected by projecting a laser sheet through the airflow. The shock within the airflow is the densest part and refracts the laser sheet the most in its vicinity, leaving a dark spot or shadowgraph. These spots show up as a dip or negative peak within the pixel intensity profile of an image of the projected laser sheet. The smart camera acquires and processes in real-time the linear image containing the shock shadowgraph and outputting the shock location. Previously a high-speed camera and personal computer would perform the image capture and processing to determine the shock location. This innovation consists of a linear image sensor, analog signal processing circuit, and a digital circuit that provides a numerical digital output of the shock or negative edge location. The smart camera is capable of capturing and processing linear images at over 1,000 frames per second. The edges are identified as numeric pixel values within the linear array of pixels, and the edge location information can be sent out from the circuit in a variety of ways, such as by using a microcontroller and onboard or external digital interface to include serial data such as RS-232/485, USB, Ethernet, or CAN BUS; parallel digital data; or an analog signal. The smart camera system can be integrated into a small package with a relatively small number of parts, reducing size and increasing reliability over the previous imaging system..
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

NASA Technical Reports Server (NTRS)

Yamakov, Vesselin I.

2016-01-01

This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
Analysis and identification of subsynchronous vibration for a high pressure parallel flow centrifugal compressor

NASA Technical Reports Server (NTRS)

Kirk, R. G.; Nicholas, J. C.; Donald, G. H.; Murphy, R. C.

1980-01-01

The summary of a complete analytical design evaluation of an existing parallel flow compressor is presented and a field vibration problem that manifested itself as a subsynchronous vibration that tracked at approximately 2/3 of compressor speed is reviewed. The comparison of predicted and observed peak response speeds, frequency spectrum content, and the performance of the bearing-seal systems are presented as the events of the field problem are reviewed. Conclusions and recommendations are made as to the degree of accuracy of the analytical techniques used to evaluate the compressor design.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonachea, Dan; Hargrove, P.

GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".
Parallelism in Manipulator Dynamics. Revision.

DTIC Science & Technology

1983-12-01

computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the... computations , and presents two "mathematically exact" formulationsespecially suited to high-speed, highly parallel implementa- tions using special-purpose...YNAMICS by I(IIAR) IIAROLI) LATIROP .4ISTRACT This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair
Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

PubMed

Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

2014-12-23

Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
Real-Time Dynamic Observation of Micro-Friction on the Contact Interface of Friction Lining

PubMed Central

Zhang, Dekun; Chen, Kai; Guo, Yongbo

2018-01-01

This paper aims to investigate the microscopic friction mechanism based on in situ microscopic observation in order to record the deformation and contact situation of friction lining during the frictional process. The results show that friction coefficient increased with the shear deformation and energy loss of the surfacee, respectively. Furthermore, the friction mechanism mainly included adhesive friction in the high-pressure and high-speed conditions, whereas hysteresis friction was in the low-pressure and low-speed conditions. The mixed-friction mechanism was in the period when the working conditions varied from high pressure and speed to low pressure and speed. PMID:29498677
The OpenCalphad thermodynamic software interface.

PubMed

Sundman, Bo; Kattner, Ursula R; Sigli, Christophe; Stratmann, Matthias; Le Tellier, Romain; Palumbo, Mauro; Fries, Suzana G

2016-12-01

Thermodynamic data are needed for all kinds of simulations of materials processes. Thermodynamics determines the set of stable phases and also provides chemical potentials, compositions and driving forces for nucleation of new phases and phase transformations. Software to simulate materials properties needs accurate and consistent thermodynamic data to predict metastable states that occur during phase transformations. Due to long calculation times thermodynamic data are frequently pre-calculated into "lookup tables" to speed up calculations. This creates additional uncertainties as data must be interpolated or extrapolated and conditions may differ from those assumed for creating the lookup table. Speed and accuracy requires that thermodynamic software is fully parallelized and the Open-Calphad (OC) software is the first thermodynamic software supporting this feature. This paper gives a brief introduction to computational thermodynamics and introduces the basic features of the OC software and presents four different application examples to demonstrate its versatility.
The OpenCalphad thermodynamic software interface

PubMed Central

Sundman, Bo; Kattner, Ursula R; Sigli, Christophe; Stratmann, Matthias; Le Tellier, Romain; Palumbo, Mauro; Fries, Suzana G

2017-01-01

Thermodynamic data are needed for all kinds of simulations of materials processes. Thermodynamics determines the set of stable phases and also provides chemical potentials, compositions and driving forces for nucleation of new phases and phase transformations. Software to simulate materials properties needs accurate and consistent thermodynamic data to predict metastable states that occur during phase transformations. Due to long calculation times thermodynamic data are frequently pre-calculated into “lookup tables” to speed up calculations. This creates additional uncertainties as data must be interpolated or extrapolated and conditions may differ from those assumed for creating the lookup table. Speed and accuracy requires that thermodynamic software is fully parallelized and the Open-Calphad (OC) software is the first thermodynamic software supporting this feature. This paper gives a brief introduction to computational thermodynamics and introduces the basic features of the OC software and presents four different application examples to demonstrate its versatility. PMID:28260838
Efficient parallel implementations of QM/MM-REMD (quantum mechanical/molecular mechanics-replica-exchange MD) and umbrella sampling: isomerization of H2O2 in aqueous solution.

PubMed

Fedorov, Dmitri G; Sugita, Yuji; Choi, Cheol Ho

2013-07-03

An efficient parallel implementation of QM/MM-based replica-exchange molecular dynamics (REMD) as well as umbrella samplings techniques was proposed by adopting the generalized distributed data interface (GDDI). Parallelization speed-up of 40.5 on 48 cores was achieved, making our QM/MM-MD engine a robust tool for studying complex chemical dynamics in solution. They were comparatively used to study the torsional isomerization of hydrogen peroxide in aqueous solution. All results by QM/MM-REMD and QM/MM umbrella sampling techniques yielded nearly identical potentials of mean force (PMFs) regardless of the particular QM theories for solute, showing that the overall dynamics are mainly determined by solvation. Although the entropic penalty of solvent rearrangements exists in cisoid conformers, it was found that both strong intermolecular hydrogen bonding and dipole-dipole interactions preferentially stabilize them in solution, reducing the torsional free-energy barrier at 0° by about 3 kcal/mol as compared to that in gas phase.
Transputer parallel processing at NASA Lewis Research Center

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1989-01-01

The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
Optimization of Microelectronic Devices for Sensor Applications

NASA Technical Reports Server (NTRS)

Cwik, Tom; Klimeck, Gerhard

2000-01-01

The NASA/JPL goal to reduce payload in future space missions while increasing mission capability demands miniaturization of active and passive sensors, analytical instruments and communication systems among others. Currently, typical system requirements include the detection of particular spectral lines, associated data processing, and communication of the acquired data to other systems. Advances in lithography and deposition methods result in more advanced devices for space application, while the sub-micron resolution currently available opens a vast design space. Though an experimental exploration of this widening design space-searching for optimized performance by repeated fabrication efforts-is unfeasible, it does motivate the development of reliable software design tools. These tools necessitate models based on fundamental physics and mathematics of the device to accurately model effects such as diffraction and scattering in opto-electronic devices, or bandstructure and scattering in heterostructure devices. The software tools must have convenient turn-around times and interfaces that allow effective usage. The first issue is addressed by the application of high-performance computers and the second by the development of graphical user interfaces driven by properly developed data structures. These tools can then be integrated into an optimization environment, and with the available memory capacity and computational speed of high performance parallel platforms, simulation of optimized components can proceed. In this paper, specific applications of the electromagnetic modeling of infrared filtering, as well as heterostructure device design will be presented using genetic algorithm global optimization methods.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented

MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Lau, Sonie

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Duong, Tuan A. (Inventor); Daud, Taher (Inventor)

1992-01-01

High-speed, analog, fully-parallel, and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A computation intensive feature classification application was demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as an application specific coprocessor for solving real world problems at extremely high data rates.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)

1995-01-01

High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.
RAMA: A file system for massively parallel computers

NASA Technical Reports Server (NTRS)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
CWICOM: A Highly Integrated & Innovative CCSDS Image Compression ASIC

NASA Astrophysics Data System (ADS)

Poupat, Jean-Luc; Vitulli, Raffaele

2013-08-01

The space market is more and more demanding in terms of on image compression performances. The earth observation satellites instrument resolution, the agility and the swath are continuously increasing. It multiplies by 10 the volume of picture acquired on one orbit. In parallel, the satellites size and mass are decreasing, requiring innovative electronic technologies reducing size, mass and power consumption. Astrium, leader on the market of the combined solutions for compression and memory for space application, has developed a new image compression ASIC which is presented in this paper. CWICOM is a high performance and innovative image compression ASIC developed by Astrium in the frame of the ESA contract n°22011/08/NLL/LvH. The objective of this ESA contract is to develop a radiation hardened ASIC that implements the CCSDS 122.0-B-1 Standard for Image Data Compression, that has a SpaceWire interface for configuring and controlling the device, and that is compatible with Sentinel-2 interface and with similar Earth Observation missions. CWICOM stands for CCSDS Wavelet Image COMpression ASIC. It is a large dynamic, large image and very high speed image compression ASIC potentially relevant for compression of any 2D image with bi-dimensional data correlation such as Earth observation, scientific data compression… The paper presents some of the main aspects of the CWICOM development, such as the algorithm and specification, the innovative memory organization, the validation approach and the status of the project.
3-Dimensional Marine CSEM Modeling by Employing TDFEM with Parallel Solvers

NASA Astrophysics Data System (ADS)

Wu, X.; Yang, T.

2013-12-01

In this paper, parallel fulfillment is developed for forward modeling of the 3-Dimensional controlled source electromagnetic (CSEM) by using time-domain finite element method (TDFEM). Recently, a greater attention rises on research of hydrocarbon (HC) reservoir detection mechanism in the seabed. Since China has vast ocean resources, seeking hydrocarbon reservoirs become significant in the national economy. However, traditional methods of seismic exploration shown a crucial obstacle to detect hydrocarbon reservoirs in the seabed with a complex structure, due to relatively high acquisition costs and high-risking exploration. In addition, the development of EM simulations typically requires both a deep knowledge of the computational electromagnetics (CEM) and a proper use of sophisticated techniques and tools from computer science. However, the complexity of large-scale EM simulations often requires large memory because of a large amount of data, or solution time to address problems concerning matrix solvers, function transforms, optimization, etc. The objective of this paper is to present parallelized implementation of the time-domain finite element method for analysis of three-dimensional (3D) marine controlled source electromagnetic problems. Firstly, we established a three-dimensional basic background model according to the seismic data, then electromagnetic simulation of marine CSEM was carried out by using time-domain finite element method, which works on a MPI (Message Passing Interface) platform with exact orientation to allow fast detecting of hydrocarbons targets in ocean environment. To speed up the calculation process, SuperLU of an MPI (Message Passing Interface) version called SuperLU_DIST is employed in this approach. Regarding the representation of three-dimension seabed terrain with sense of reality, the region is discretized into an unstructured mesh rather than a uniform one in order to reduce the number of unknowns. Moreover, high-order Whitney vector basis functions are used for spatial discretization within the finite element approach to approximate the electric field. A horizontal electric dipole was used as a source, and an array of the receiver located at the seabed. To capture the presence of the hydrocarbon layer, the forward responses at water depths from 100m to 3000m are calculated. The normalized Magnitude Versus Offset (N-MVO) and Phase Versus Offset (PVO) curve can reflect resistive characteristics of hydrocarbon layers. For future work, Graphics Process Unit (GPU) acceleration algorithm would be carried out to multiply the calculation efficiency greatly.
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
A data acquisition and control system for high-speed gamma-ray tomography

NASA Astrophysics Data System (ADS)

Hjertaker, B. T.; Maad, R.; Schuster, E.; Almås, O. A.; Johansen, G. A.

2008-09-01

A data acquisition and control system (DACS) for high-speed gamma-ray tomography based on the USB (Universal Serial Bus) and Ethernet communication protocols has been designed and implemented. The high-speed gamma-ray tomograph comprises five 500 mCi 241Am gamma-ray sources, each at a principal energy of 59.5 keV, which corresponds to five detector modules, each consisting of 17 CdZnTe detectors. The DACS design is based on Microchip's PIC18F4550 and PIC18F4620 microcontrollers, which facilitates an USB 2.0 interface protocol and an Ethernet (IEEE 802.3) interface protocol, respectively. By implementing the USB- and Ethernet-based DACS, a sufficiently high data acquisition rate is obtained and no dedicated hardware installation is required for the data acquisition computer, assuming that it is already equipped with a standard USB and/or Ethernet port. The API (Application Programming Interface) for the DACS is founded on the National Instrument's LabVIEW® graphical development tool, which provides a simple and robust foundation for further application software developments for the tomograph. The data acquisition interval, i.e. the integration time, of the high-speed gamma-ray tomograph is user selectable and is a function of the statistical measurement accuracy required for the specific application. The bandwidth of the DACS is 85 kBytes s-1 for the USB communication protocol and 28 kBytes s-1 for the Ethernet protocol. When using the iterative least square technique reconstruction algorithm with a 1 ms integration time, the USB-based DACS provides an online image update rate of 38 Hz, i.e. 38 frames per second, whereas 31 Hz for the Ethernet-based DACS. The off-line image update rate (storage to disk) for the USB-based DACS is 278 Hz using a 1 ms integration time. Initial characterization of the high-speed gamma-ray tomograph using the DACS on polypropylene phantoms is presented in the paper.
SDS: A Framework for Scientific Data Services

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong, Bin; Byna, Surendra; Wu, Kesheng

2013-10-31

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read callsmore » to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.« less
Research on parallel combinatory spread spectrum communication system with double information matching

NASA Astrophysics Data System (ADS)

Xue, Wei; Wang, Qi; Wang, Tianyu

2018-04-01

This paper presents an improved parallel combinatory spread spectrum (PC/SS) communication system with the method of double information matching (DIM). Compared with conventional PC/SS system, the new model inherits the advantage of high transmission speed, large information capacity and high security. Besides, the problem traditional system will face is the high bit error rate (BER) and since its data-sequence mapping algorithm. Hence the new model presented shows lower BER and higher efficiency by its optimization of mapping algorithm.
Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

PubMed Central

Zhu, Xiang; Zhang, Dianwen

2013-01-01

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
A compressible multiphase framework for simulating supersonic atomization

NASA Astrophysics Data System (ADS)

Regele, Jonathan D.; Garrick, Daniel P.; Hosseinzadeh-Nik, Zahra; Aslani, Mohamad; Owkes, Mark

2016-11-01

The study of atomization in supersonic combustors is critical in designing efficient and high performance scramjets. Numerical methods incorporating surface tension effects have largely focused on the incompressible regime as most atomization applications occur at low Mach numbers. Simulating surface tension effects in high speed compressible flow requires robust numerical methods that can handle discontinuities caused by both material interfaces and shocks. A shock capturing/diffused interface method is developed to simulate high-speed compressible gas-liquid flows with surface tension effects using the five-equation model. This includes developments that account for the interfacial pressure jump that occurs in the presence of surface tension. A simple and efficient method for computing local interface curvature is developed and an acoustic non-dimensional scaling for the surface tension force is proposed. The method successfully captures a variety of droplet breakup modes over a range of Weber numbers and demonstrates the impact of surface tension in countering droplet deformation in both subsonic and supersonic cross flows.
Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

NASA Astrophysics Data System (ADS)

Wittek, Peter; Calderaro, Luca

2015-12-01

We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.
An MPA-IO interface to HPSS

NASA Technical Reports Server (NTRS)

Jones, Terry; Mark, Richard; Martin, Jeanne; May, John; Pierce, Elsie; Stanberry, Linda

1996-01-01

This paper describes an implementation of the proposed MPI-IO (Message Passing Interface - Input/Output) standard for parallel I/O. Our system uses third-party transfer to move data over an external network between the processors where it is used and the I/O devices where it resides. Data travels directly from source to destination, without the need for shuffling it among processors or funneling it through a central node. Our distributed server model lets multiple compute nodes share the burden of coordinating data transfers. The system is built on the High Performance Storage System (HPSS), and a prototype version runs on a Meiko CS-2 parallel computer.
Highly efficient spatial data filtering in parallel using the opensource library CPPPO

NASA Astrophysics Data System (ADS)

Municchi, Federico; Goniva, Christoph; Radl, Stefan

2016-10-01

CPPPO is a compilation of parallel data processing routines developed with the aim to create a library for "scale bridging" (i.e. connecting different scales by mean of closure models) in a multi-scale approach. CPPPO features a number of parallel filtering algorithms designed for use with structured and unstructured Eulerian meshes, as well as Lagrangian data sets. In addition, data can be processed on the fly, allowing the collection of relevant statistics without saving individual snapshots of the simulation state. Our library is provided with an interface to the widely-used CFD solver OpenFOAM®, and can be easily connected to any other software package via interface modules. Also, we introduce a novel, extremely efficient approach to parallel data filtering, and show that our algorithms scale super-linearly on multi-core clusters. Furthermore, we provide a guideline for choosing the optimal Eulerian cell selection algorithm depending on the number of CPU cores used. Finally, we demonstrate the accuracy and the parallel scalability of CPPPO in a showcase focusing on heat and mass transfer from a dense bed of particles.
JPARSS: A Java Parallel Network Package for Grid Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Jie; Akers, Walter; Chen, Ying

2002-03-01

The emergence of high speed wide area networks makes grid computinga reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (Socket)) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a grid environment without the necessity of tuning TCP window size.more » This package enables single sign-on, certificate delegation and secure or plain-text data transfer using several security components based on X.509 certificate and SSL. Several experiments will be presented to show that using Java parallelstreams is more effective than tuning TCP window size. In addition a simple architecture using Web services« less
Impedance Discontinuity Reduction Between High-Speed Differential Connectors and PCB Interfaces

NASA Technical Reports Server (NTRS)

Navidi, Sal; Agdinaoay, Rodell; Walter, Keith

2013-01-01

High-speed serial communication (i.e., Gigabit Ethernet) requires differential transmission and controlled impedances. Impedance control is essential throughout cabling, connector, and circuit board construction. An impedance discontinuity arises at the interface of a high-speed quadrax and twinax connectors and the attached printed circuit board (PCB). This discontinuity usually is lower impedance since the relative dielectric constant of the board is higher (i.e., polyimide approx. = 4) than the connector (Teflon approx. = 2.25). The discontinuity can be observed in transmit or receive eye diagrams, and can reduce the effective link margin of serial data networks. High-speed serial data network transmission improvements can be made at the connector-to-board interfaces as well as improving differential via hole impedances. The impedance discontinuity was improved by 10 percent by drilling a 20-mil (approx. = 0.5-mm) hole in between the pin of a differential connector spaced 55 mils (approx. = 1.4 mm) apart as it is attached to the PCB. The effective dielectric constant of the board can be lowered by drilling holes into the board material between the differential lines in a quadrax or twinax connector attachment points. The differential impedance is inversely proportional to the square root of the relative dielectric constant. This increases the differential impedance and thus reduces the above described impedance discontinuity. The differential via hole impedance can also be increased in the same manner. This technique can be extended to multiple smaller drilled holes as well as tapered holes (i.e., big in the middle followed by smaller ones diagonally).
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
A parallel-pipelined architecture for a multi carrier demodulator

NASA Astrophysics Data System (ADS)

Kwatra, S. C.; Jamali, M. M.; Eugene, Linus P.

1991-03-01

Analog devices have been used for processing the information on board the satellites. Presently, digital devices are being used because they are economical and flexible as compared to their analog counterparts. Several schemes of digital transmission can be used depending on the data rate requirement of the user. An economical scheme of transmission for small earth stations uses single channel per carrier/frequency division multiple access (SCPC/FDMA) on the uplink and time division multiplexing (TDM) on the downlink. This is a typical communication service offered to low data rate users in commercial mass market. These channels usually pertain to either voice or data transmission. An efficient digital demodulator architecture is provided for a large number of law data rate users. A demodulator primarily consists of carrier, clock, and data recovery modules. This design uses principles of parallel processing, pipelining, and time sharing schemes to process large numbers of voice or data channels. It maintains the optimum throughput which is derived from the designed architecture and from the use of high speed components. The design is optimized for reduced power and area requirements. This is essential for satellite applications. The design is also flexible in processing a group of a varying number of channels. The algorithms that are used are verified by the use of a computer aided software engineering (CASE) tool called the Block Oriented System Simulator. The data flow, control circuitry, and interface of the hardware design is simulated in C language. Also, a multiprocessor approach is provided to map, model, and simulate the demodulation algorithms mainly from a speed view point. A hypercude based architecture implementation is provided for such a scheme of operation. The hypercube structure and the demodulation models on hypercubes are simulated in Ada.

A parallel-pipelined architecture for a multi carrier demodulator. M.S. Thesis Final Technical Report, Jan. 1989 - Aug. 1990

NASA Technical Reports Server (NTRS)

Kwatra, S. C.; Jamali, M. M.; Eugene, Linus P.

1991-01-01

Analog devices have been used for processing the information on board the satellites. Presently, digital devices are being used because they are economical and flexible as compared to their analog counterparts. Several schemes of digital transmission can be used depending on the data rate requirement of the user. An economical scheme of transmission for small earth stations uses single channel per carrier/frequency division multiple access (SCPC/FDMA) on the uplink and time division multiplexing (TDM) on the downlink. This is a typical communication service offered to low data rate users in commercial mass market. These channels usually pertain to either voice or data transmission. An efficient digital demodulator architecture is provided for a large number of law data rate users. A demodulator primarily consists of carrier, clock, and data recovery modules. This design uses principles of parallel processing, pipelining, and time sharing schemes to process large numbers of voice or data channels. It maintains the optimum throughput which is derived from the designed architecture and from the use of high speed components. The design is optimized for reduced power and area requirements. This is essential for satellite applications. The design is also flexible in processing a group of a varying number of channels. The algorithms that are used are verified by the use of a computer aided software engineering (CASE) tool called the Block Oriented System Simulator. The data flow, control circuitry, and interface of the hardware design is simulated in C language. Also, a multiprocessor approach is provided to map, model, and simulate the demodulation algorithms mainly from a speed view point. A hypercude based architecture implementation is provided for such a scheme of operation. The hypercube structure and the demodulation models on hypercubes are simulated in Ada.
Water liquid-vapor interface subjected to various electric fields: A molecular dynamics study.

PubMed

Nikzad, Mohammadreza; Azimian, Ahmad Reza; Rezaei, Majid; Nikzad, Safoora

2017-11-28

Investigation of the effects of E-fields on the liquid-vapor interface is essential for the study of floating water bridge and wetting phenomena. The present study employs the molecular dynamics method to investigate the effects of parallel and perpendicular E-fields on the water liquid-vapor interface. For this purpose, density distribution, number of hydrogen bonds, molecular orientation, and surface tension are examined to gain a better understanding of the interface structure. Results indicate enhancements in parallel E-field decrease the interface width and number of hydrogen bonds, while the opposite holds true in the case of perpendicular E-fields. Moreover, perpendicular fields disturb the water structure at the interface. Given that water molecules tend to be parallel to the interface plane, it is observed that perpendicular E-fields fail to realign water molecules in the field direction while the parallel ones easily do so. It is also shown that surface tension rises with increasing strength of parallel E-fields, while it reduces in the case of perpendicular E-fields. Enhancement of surface tension in the parallel field direction demonstrates how the floating water bridge forms between the beakers. Finally, it is found that application of external E-fields to the liquid-vapor interface does not lead to uniform changes in surface tension and that the liquid-vapor interfacial tension term in Young's equation should be calculated near the triple-line of the droplet. This is attributed to the multi-directional nature of the droplet surface, indicating that no constant value can be assigned to a droplet's surface tension in the presence of large electric fields.
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)

2000-01-01

HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
High-resolution, high-throughput imaging with a multibeam scanning electron microscope.

PubMed

Eberle, A L; Mikula, S; Schalek, R; Lichtman, J; Knothe Tate, M L; Zeidler, D

2015-08-01

Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers. © 2015 The Authors Journal of Microscopy © 2015 Royal Microscopical Society.
Laser velocimeter (autocovariance) buffer interface

NASA Technical Reports Server (NTRS)

Clemmons, J. I., Jr.

1981-01-01

A laser velocimeter (autocovariance) buffer interface (LVABI) was developed to serve as the interface between three laser velocimeter high speed burst counters and a minicomputer. A functional description is presented of the instrument and its unique features which allow the studies of flow velocity vector analysis, turbulence power spectra, and conditional sampling of other phenomena. Typical applications of the laser velocimeter using the LVABI are presented to illustrate its various capabilities.
Electron acceleration by surface plasma waves in double metal surface structure

NASA Astrophysics Data System (ADS)

Liu, C. S.; Kumar, Gagan; Singh, D. B.; Tripathi, V. K.

2007-12-01

Two parallel metal sheets, separated by a vacuum region, support a surface plasma wave whose amplitude is maximum on the two parallel interfaces and minimum in the middle. This mode can be excited by a laser using a glass prism. An electron beam launched into the middle region experiences a longitudinal ponderomotive force due to the surface plasma wave and gets accelerated to velocities of the order of phase velocity of the surface wave. The scheme is viable to achieve beams of tens of keV energy. In the case of a surface plasma wave excited on a single metal-vacuum interface, the field gradient normal to the interface pushes the electrons away from the high field region, limiting the acceleration process. The acceleration energy thus achieved is in agreement with the experimental observations.
Acceleration of low-energy ions at parallel shocks with a focused transport model

DOE PAGES

Zuo, Pingbing; Zhang, Ming; Rassoul, Hamid K.

2013-04-10

Here, we present a test particle simulation on the injection and acceleration of low-energy suprathermal particles by parallel shocks with a focused transport model. The focused transport equation contains all necessary physics of shock acceleration, but avoids the limitation of diffusive shock acceleration (DSA) that requires a small pitch angle anisotropy. This simulation verifies that the particles with speeds of a fraction of to a few times the shock speed can indeed be directly injected and accelerated into the DSA regime by parallel shocks. At higher energies starting from a few times the shock speed, the energy spectrum of acceleratedmore » particles is a power law with the same spectral index as the solution of standard DSA theory, although the particles are highly anisotropic in the upstream region. The intensity, however, is different from that predicted by DSA theory, indicating a different level of injection efficiency. It is found that the shock strength, the injection speed, and the intensity of an electric cross-shock potential (CSP) jump can affect the injection efficiency of the low-energy particles. A stronger shock has a higher injection efficiency. In addition, if the speed of injected particles is above a few times the shock speed, the produced power-law spectrum is consistent with the prediction of standard DSA theory in both its intensity and spectrum index with an injection efficiency of 1. CSP can increase the injection efficiency through direct particle reflection back upstream, but it has little effect on the energetic particle acceleration once the speed of injected particles is beyond a few times the shock speed. This test particle simulation proves that the focused transport theory is an extension of DSA theory with the capability of predicting the efficiency of particle injection.« less
BiCMOS circuit technology for a 704 MHz ATM switch LSI

NASA Astrophysics Data System (ADS)

Ohtomo, Yusuke; Yasuda, Sadayuki; Togashi, Minoru; Ino, Masayuki; Tanabe, Yasuyuki; Inoue, Jun-Ichi; Nogawa, Masafumi; Hino, Shigeki

1994-05-01

This paper describes BiCMOS level-converter circuits and clock circuits that increase VLSI interface speed to 1 GHz, and their application to a 704 MHz ATM switch LSI. An LSI with high speed interface requires a BiCMOS multiplexer/demultiplexer (MUX/DEMUX) on the chip to reduce internal operation speed. A MUX/DEMUX with minimum power dissipation and a minimum pattern area can be designed using the proposed converter circuits. The converter circuits, using weakly cross-coupled CMOS inverters and a voltage regulator circuit, can convert signal levels between LCML and positive CMOS at a speed of 500 MHz. Data synchronization in the high speed region is ensured by a new BiCMOS clock circuit consisting of a pure ECL path and retiming circuits. The clock circuit reduces the chip latency fluctuation of the clock signal and absorbs the delay difference between the ECL clock and data through the CMOS circuits. A rerouting-Banyan (RRB) ATM switch, employing both the proposed converter circuits and the clock circuits, has been fabricated with 0.5 micron BiCMOS technology. The LSI, composed of CMOS 15 K gate LOGIC, 8 Kb RAM, 1 Kb FIFO and ECL 1.6 K gate LOGIC, achieved an operation speed of 704-MHz with power dissipation of 7.2 W.
Experimental investigation of the displacement dynamics during biphasic flow in porous media

NASA Astrophysics Data System (ADS)

Ayaz, Monem; Toussaint, Renaud; Måløy, Knut-Jørgen; Schafer, Gerhard

2016-04-01

We experimentally study the interface dynamics of an immiscible fluid as it displaces a fully saturated porous medium. The system is confined by a vertically oriented Hele-Shaw cell, with piezoelectric type acoustic sensors mounted along the centerline. During drainage potential surface energy is stored at the interface up to a given threshold in pressure, at which an instability occurs as new pores are invaded and the radius of curvature of the interface increases locally, the energy gets released, and part of this energy is detectable as acoustic emission. By detecting pore-scale events emanating from the interface at various points, we look to develop techniques for localizing the displacement front. To assess the quality, optical monitoring is done using a high speed camera.In our study we also aim to gain further insight into the interface dynamics by varying parameters such as the effective gravity, and the invasion speed and using other methods of probing the system such as active tomography. We here present our preliminary results of this study.
Pulsed particle beam vacuum-to-air interface

DOEpatents

Cruz, G.E.; Edwards, W.F.

1987-06-18

A vacuum-to-air interface is provided for a high-powered, pulsed particle beam accelerator. The interface comprises a pneumatic high speed gate valve, from which extends a vacuum-tight duct, that terminates in an aperture. Means are provided for periodically advancing a foil strip across the aperture at the repetition rate of the particle pulses. A pneumatically operated hollow sealing band urges foil strip, when stationary, against and into the aperture. Gas pressure means periodically lift off and separate foil strip from aperture, so that it may be readily advanced. 5 figs.
High Speed A/D DSP Interface for Carrier Doppler Tracking

NASA Technical Reports Server (NTRS)

Baggett, Timothy

1998-01-01

As on-board satellite systems continue to increase in ability to perform self diagnostic checks, it will become more important for satellites to initiate ground communications contact. Currently, the NASA Space Network requires users to pre-arranged times for satellite communications links through the Tracking and Data Relay Satellite (TDRS). One of the challenges in implementing an on-demand access protocol into the Space Network, is the fact that a low Earth orbiting (LEO) satellite's communications will be subject to a doppler shift which is outside the capability of the NASA ground station to lock onto. In a prearranged system, the satellite's doppler is known a priori, and the ground station is able to lock onto the satellite's signal. This paper describes the development of a high speed analog to digital interface into a Digital Signal Processor (DSP). This system will be used for identifying the doppler shift of a LEO satellite through the Space Network, and aiding the ground station equipment in locking onto the signal. Although this interface is specific to one application, it can be used as a basis for interfacing other devices with a DSP.
System and method to allow a synchronous motor to successfully synchronize with loads that have high inertia and/or high torque

DOEpatents

Melfi, Michael J.

2015-10-20

A mechanical soft-start type coupling is used as an interface between a line start, synchronous motor and a heavy load to enable the synchronous motor to bring the heavy load up to or near synchronous speed. The soft-start coupling effectively isolates the synchronous motor from the load for enough time to enable the synchronous motor to come up to full speed. The soft-start coupling then brings the load up to or near synchronous speed.
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Effects of changes in size, speed and distance on the perception of curved 3D trajectories

PubMed Central

Zhang, Junjun; Braunstein, Myron L.; Andersen, George J.

2012-01-01

Previous research on the perception of 3D object motion has considered time to collision, time to passage, collision detection and judgments of speed and direction of motion, but has not directly studied the perception of the overall shape of the motion path. We examined the perception of the magnitude of curvature and sign of curvature of the motion path for objects moving at eye level in a horizontal plane parallel to the line of sight. We considered two sources of information for the perception of motion trajectories: changes in angular size and changes in angular speed. Three experiments examined judgments of relative curvature for objects moving at different distances. At the closest distance studied, accuracy was high with size information alone but near chance with speed information alone. At the greatest distance, accuracy with size information alone decreased sharply but accuracy for displays with both size and speed information remained high. We found similar results in two experiments with judgments of sign of curvature. Accuracy was higher for displays with both size and speed information than with size information alone, even when the speed information was based on parallel projections and was not informative about sign of curvature. For both magnitude of curvature and sign of curvature judgments, information indicating that the trajectory was curved increased accuracy, even when this information was not directly relevant to the required judgment. PMID:23007204
Common data buffer

NASA Technical Reports Server (NTRS)

Byrne, F.

1981-01-01

Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

NASA Technical Reports Server (NTRS)

Datta, Anubhav; Johnson, Wayne R.

2009-01-01

This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
An accurate, fast, and scalable solver for high-frequency wave propagation

NASA Astrophysics Data System (ADS)

Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.

2017-12-01

In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Resonant tunnelling diode based high speed optoelectronic transmitters

NASA Astrophysics Data System (ADS)

Wang, Jue; Rodrigues, G. C.; Al-Khalidi, Abdullah; Figueiredo, José M. L.; Wasige, Edward

2017-08-01

Resonant tunneling diode (RTD) integration with photo detector (PD) from epi-layer design shows great potential for combining terahertz (THz) RTD electronic source with high speed optical modulation. With an optimized layer structure, the RTD-PD presented in the paper shows high stationary responsivity of 5 A/W at 1310 nm wavelength. High power microwave/mm-wave RTD-PD optoelectronic oscillators are proposed. The circuitry employs two RTD-PD devices in parallel. The oscillation frequencies range from 20-44 GHz with maximum attainable power about 1 mW at 34/37/44GHz.
Massively parallel information processing systems for space applications

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1979-01-01

NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.

Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Limits to high-speed simulations of spiking neural networks using general-purpose computers.

PubMed

Zenke, Friedemann; Gerstner, Wulfram

2014-01-01

To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
A 12-bit high-speed column-parallel two-step single-slope analog-to-digital converter (ADC) for CMOS image sensors.

PubMed

Lyu, Tao; Yao, Suying; Nie, Kaiming; Xu, Jiangtao

2014-11-17

A 12-bit high-speed column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct errors caused by offsets among the reference voltages. The digital-to-analog converter (DAC) used for the ramp generator is based on the split-capacitor array with an attenuation capacitor. Analysis of the DAC's linearity performance versus capacitor mismatch and parasitic capacitance is presented. A prototype 1024 × 32 Time Delay Integration (TDI) CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and a conventional rate 6 times higher than the conventional SS ADC. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CMOS image sensors.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
A High-Speed Design of Montgomery Multiplier

NASA Astrophysics Data System (ADS)

Fan, Yibo; Ikenaga, Takeshi; Goto, Satoshi

With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25μm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180MHz and the throughput of 1024-bit RSA encryption is 352kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Image Understanding. Proceedings of a Workshop Held in Pittsburgh, Pennsylvania on 11-13 September, 1990

DTIC Science & Technology

1990-09-01

performed some preliminary longest piers are about three times the length of a de- experiments to detect the ships in the high resolution stroyer...statistics, and these are coordinates then shipped via a high - speed interface to a host where the stereo triangulation and kinematic control algorithms Grasp...Design: Perception research includes the design of new sensor technologies, such as this hybrid analog/digital chip for a high - speed light-stripe
Research and development of a NYNEX switched multi-megabit data service prototype system

NASA Astrophysics Data System (ADS)

Maman, K. H.; Haines, Robert; Chatterjee, Samir

1991-02-01

Switched Multi-megabit Data Service (SMDS) is a proposed high-speed packet-switched service which will support broadband applications such as Local Area Network (LAN) interconnections across a metropolitan area and beyond. This service is designed to take advantage of evolving Metropolitan Area Network (MAN) standards and technology which will provide customers with 45-mbps and 1 . 5-mbps access to high-speed public data communications networks. This paper will briefly discuss SMDS and review its architecture including the Subscriber Network Interface (SNI) and the SMDS Interface Protocol (SIP). It will review the fundamental features of SMDS such as address screening addressing scheme and access classes. Then it will describe the SMDS prototype system developed in-house by NYNEX Science Technology.
Heat Transfer in the Turbulent Boundary Layer of a Compressible Gas at High Speeds

NASA Technical Reports Server (NTRS)

Frankl, F.

1942-01-01

The Reynolds law of heat transfer from a wall to a turbulent stream is extended to the case of flow of a compressible gas at high speeds. The analysis is based on the modern theory of the turbulent boundary layer with laminar sublayer. The investigation is carried out for the case of a plate situated in a parallel stream. The results are obtained independently of the velocity distribution in the turbulent boundar layer.
A Comparison of Parallelism in Interface Designs for Computer-Based Learning Environments

ERIC Educational Resources Information Center

Min, Rik; Yu, Tao; Spenkelink, Gerd; Vos, Hans

2004-01-01

In this paper we discuss an experiment that was carried out with a prototype, designed in conformity with the concept of parallelism and the Parallel Instruction theory (the PI theory). We designed this prototype with five different interfaces, and ran an empirical study in which 18 participants completed an abstract task. The five basic designs…
MUTILS - a set of efficient modeling tools for multi-core CPUs implemented in MEX

NASA Astrophysics Data System (ADS)

Krotkiewski, Marcin; Dabrowski, Marcin

2013-04-01

The need for computational performance is common in scientific applications, and in particular in numerical simulations, where high resolution models require efficient processing of large amounts of data. Especially in the context of geological problems the need to increase the model resolution to resolve physical and geometrical complexities seems to have no limits. Alas, the performance of new generations of CPUs does not improve any longer by simply increasing clock speeds. Current industrial trends are to increase the number of computational cores. As a result, parallel implementations are required in order to fully utilize the potential of new processors, and to study more complex models. We target simulations on small to medium scale shared memory computers: laptops and desktop PCs with ~8 CPU cores and up to tens of GB of memory to high-end servers with ~50 CPU cores and hundereds of GB of memory. In this setting MATLAB is often the environment of choice for scientists that want to implement their own models with little effort. It is a useful general purpose mathematical software package, but due to its versatility some of its functionality is not as efficient as it could be. In particular, the challanges of modern multi-core architectures are not fully addressed. We have developed MILAMIN 2 - an efficient FEM modeling environment written in native MATLAB. Amongst others, MILAMIN provides functions to define model geometry, generate and convert structured and unstructured meshes (also through interfaces to external mesh generators), compute element and system matrices, apply boundary conditions, solve the system of linear equations, address non-linear and transient problems, and perform post-processing. MILAMIN strives to combine the ease of code development and the computational efficiency. Where possible, the code is optimized and/or parallelized within the MATLAB framework. Native MATLAB is augmented with the MUTILS library - a set of MEX functions that implement the computationally intensive, performance critical parts of the code, which we have identified to be bottlenecks. Here, we discuss the functionality and performance of the MUTILS library. Currently, it includes: 1. time and memory efficient assembly of sparse matrices for FEM simulations 2. parallel sparse matrix - vector product with optimizations speficic to symmetric matrices and multiple degrees of freedom per node 3. parallel point in triangle location and point in tetrahedron location for unstructured, adaptive 2D and 3D meshes (useful for 'marker in cell' type of methods) 4. parallel FEM interpolation for 2D and 3D meshes of elements of different types and orders, and for different number of degrees of freedom per node 5. a stand-alone, MEX implementation of the Conjugate Gradients iterative solver 6. interface to METIS graph partitioning and a fast implementation of RCM reordering
Programmable logic construction kits for hyper-real-time neuronal modeling.

PubMed

Guerrero-Rivera, Ruben; Morrison, Abigail; Diesmann, Markus; Pearce, Tim C

2006-11-01

Programmable logic designs are presented that achieve exact integration of leaky integrate-and-fire soma and dynamical synapse neuronal models and incorporate spike-time dependent plasticity and axonal delays. Highly accurate numerical performance has been achieved by modifying simpler forward-Euler-based circuitry requiring minimal circuit allocation, which, as we show, behaves equivalently to exact integration. These designs have been implemented and simulated at the behavioral and physical device levels, demonstrating close agreement with both numerical and analytical results. By exploiting finely grained parallelism and single clock cycle numerical iteration, these designs achieve simulation speeds at least five orders of magnitude faster than the nervous system, termed here hyper-real-time operation, when deployed on commercially available field-programmable gate array (FPGA) devices. Taken together, our designs form a programmable logic construction kit of commonly used neuronal model elements that supports the building of large and complex architectures of spiking neuron networks for real-time neuromorphic implementation, neurophysiological interfacing, or efficient parameter space investigations.
A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy

DOE PAGES

Pryor, Alan; Ophus, Colin; Miao, Jianwei

2017-10-25

Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. In this paper, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditionalmore » multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.« less
A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy.

PubMed

Pryor, Alan; Ophus, Colin; Miao, Jianwei

2017-01-01

Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic , using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic .
A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pryor, Alan; Ophus, Colin; Miao, Jianwei

Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. In this paper, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditionalmore » multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.« less
Kinetic Effects on Self-Assembly and Function of Protein-Polymer Bioconjugates in Thin Films Prepared by Flow Coating.

PubMed

Chang, Dongsook; Huang, Aaron; Olsen, Bradley D

2017-01-01

The self-assembly of nanostructured globular protein arrays in thin films is demonstrated using protein-polymer block copolymers based on a model protein mCherry and the polymer poly(oligoethylene glycol acrylate) (POEGA). Conjugates are flow coated into thin films on a poly(ethylene oxide) grafted Si surface, forming self-assembled cylindrical nanostructures with POEGA domains selectively segregating to the air-film interface. Long-range order and preferential arrangement of parallel cylinders templated by selective surfaces are demonstrated by controlling relative humidity. Long-range order increases with coating speed when the film thicknesses are kept constant, due to reduced nucleation per unit area of drying film. Fluorescence emission spectra of mCherry in films prepared at <25% relative humidity shows a small shift suggesting that proteins are more perturbed at low humidity than high humidity or the solution state. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

PubMed

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

PubMed Central

Sharma, Parichit; Mantri, Shrikant S.

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Fast Whole-Engine Stirling Analysis

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

2006-01-01

This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.

Software interface for high-speed readout of particle detectors based on the CoaXPress communication standard

NASA Astrophysics Data System (ADS)

Hejtmánek, M.; Neue, G.; Voleš, P.

2015-06-01

This article is devoted to the software design and development of a high-speed readout application used for interfacing particle detectors via the CoaXPress communication standard. The CoaXPress provides an asymmetric high-speed serial connection over a single coaxial cable. It uses a widely available 75 Ω BNC standard and can operate in various modes with a data throughput ranging from 1.25 Gbps up to 25 Gbps. Moreover, it supports a low speed uplink with a fixed bit rate of 20.833 Mbps, which can be used to control and upload configuration data to the particle detector. The CoaXPress interface is an upcoming standard in medical imaging, therefore its usage promises long-term compatibility and versatility. This work presents an example of how to develop DAQ system for a pixel detector. For this purpose, a flexible DAQ card was developed using the XILINX Spartan 6 FPGA. The DAQ card is connected to the framegrabber FireBird CXP6 Quad, which is plugged in the PCI Express bus of the standard PC. The data transmission was performed between the FPGA and framegrabber card via the standard coaxial cable in communication mode with a bit rate of 3.125 Gbps. Using the Medipix2 Quad pixel detector, the framerate of 100 fps was achieved. The front-end application makes use of the FireBird framegrabber software development kit and is suitable for data acquisition as well as control of the detector through the registers implemented in the FPGA.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Closha: bioinformatics workflow system for the analysis of massive sequencing data.

PubMed

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

2018-02-19

While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Real-time SHVC software decoding with multi-threaded parallel processing

NASA Astrophysics Data System (ADS)

Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

2014-09-01

This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Implementation of a high-speed face recognition system that uses an optical parallel correlator.

PubMed

Watanabe, Eriko; Kodate, Kashiko

2005-02-10

We implement a fully automatic fast face recognition system by using a 1000 frame/s optical parallel correlator designed and assembled by us. The operational speed for the 1:N (i.e., matching one image against N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 s, including the preprocessing and postprocessing times. The binary real-only matched filter is devised for the sake of face recognition, and the system is optimized by the false-rejection rate (FRR) and the false-acceptance rate (FAR), according to 300 samples selected by the biometrics guideline. From trial 1:N identification experiments with the optical parallel correlator, we acquired low error rates of 2.6% FRR and 1.3% FAR. Facial images of people wearing thin glasses or heavy makeup that rendered identification difficult were identified with this system.
Repeatability of high-speed migration of tremor along the Nankai subduction zone, Japan

NASA Astrophysics Data System (ADS)

Kato, A.; Tsuruoka, H.; Nakagawa, S.; Hirata, N.

2015-12-01

Tectonic tremors have been considered to be a swarm or superimposed pulses of low-frequency earthquakes (LFEs). To systematically analyze the high-speed migration of tremor [e.g., Shelly et al., 2007], we here focus on an intensive cluster hosting many low-frequency earthquakes located at the western part of Shikoku Island. We relocated ~770 hypocenters of LFEs identified by the JMA, which took place from Jan. 2008 to Dec. 2013, applying double differential relocation algorithm [e.g., Waldhauser and Ellsworth, 2000] to arrival times picked by the JMA and those obtained by waveform cross correlation measurements. The epicentral distributions show a clear alignment parallel to the subduction of the Philippine Sea plate, as like a slip-parallel streaking. Then, we applied a matched-filter technique to continuous seismograms recorded near the source region using relocated template LFEs during 6 years (between Jan. 2008 and Dec. 2013). We newly detected about 60 times the number of template events, which is fairly larger than ones obtained by conventional envelope cross correlation method. Interestingly, we identified many repeated sequences of tremor migrations along the slip-parallel streaking (~350 sequences). Front of each or stacked migration of tremors can be modeled by a parabolic envelope, indicating a diffusion process. The diffusivity of parabolic envelope is estimated to be around 105 m2/s, which is categorized as high-speed migration feature (~100 km/hour). Most of the rapid migrations took place during occurrences of short-term slow slip events (SSEs), and seems to be triggered by ocean and solid Earth tides. The most plausible explanation of the high-speed propagation is a diffusion process of stress pulse concentrated within a cluster of strong brittle patches on the ductile shear zone [Ando et al., 2012]. The viscosity of the ductile shear zone within the streaking is at least one order magnitude smaller than that of the slow-speed migration. This discrepancy of viscosity indicates that the streaking has different rheology compared with background main tremor/SSE belt. In addition, the diffusivity did not show any significant change before and after the Tohoku-Oki M9.0 Earthquake, suggesting that the high-speed propagation of tremors seems to be stable against external stress perturbations.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Arranging computer architectures to create higher-performance controllers

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

1988-01-01

Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
Novel wavelength diversity technique for high-speed atmospheric turbulence compensation

NASA Astrophysics Data System (ADS)

Arrasmith, William W.; Sullivan, Sean F.

2010-04-01

The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.
Multidisciplinary Design Optimization (MDO) Methods: Their Synergy with Computer Technology in Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1998-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.
Multidisciplinary Design Optimisation (MDO) Methods: Their Synergy with Computer Technology in the Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Analysis of fast and slow responses in AC conductance curves for p-type SiC MOS capacitors

NASA Astrophysics Data System (ADS)

Karamoto, Yuki; Zhang, Xufang; Okamoto, Dai; Sometani, Mitsuru; Hatakeyama, Tetsuo; Harada, Shinsuke; Iwamuro, Noriyuki; Yano, Hiroshi

2018-06-01

We used a conductance method to investigate the interface characteristics of a SiO2/p-type 4H-SiC MOS structure fabricated by dry oxidation. It was found that the measured equivalent parallel conductance–frequency (G p/ω–f) curves were not symmetric, showing that there existed both high- and low-frequency signals. We attributed high-frequency responses to fast interface states and low-frequency responses to near-interface oxide traps. To analyze the fast interface states, Nicollian’s standard conductance method was applied in the high-frequency range. By extracting the high-frequency responses from the measured G p/ω–f curves, the characteristics of the low-frequency responses were reproduced by Cooper’s model, which considers the effect of near-interface traps on the G p/ω–f curves. The corresponding density distribution of slow traps as a function of energy level was estimated.
CFD Analysis and Design Optimization Using Parallel Computers

NASA Technical Reports Server (NTRS)

Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James

1997-01-01

A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
High Performance Fortran for Aerospace Applications

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Zima, Hans; Bushnell, Dennis M. (Technical Monitor)

2000-01-01

This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications. HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications, while delegating to the compiler/runtime system the task of generating explicitly parallel message-passing programs. We begin by providing a short overview of the HPF language. This is followed by a detailed discussion of the efficient use of HPF for applications involving multiple structured grids such as multiblock and adaptive mesh refinement (AMR) codes as well as unstructured grid codes. We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.
Blade row dynamic digital compressor program. Volume 1: J85 clean inlet flow and parallel compressor models

NASA Technical Reports Server (NTRS)

Tesch, W. A.; Steenken, W. G.

1976-01-01

The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.
Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Parallel computing in experimental mechanics and optical measurement: A review (II)

NASA Astrophysics Data System (ADS)

Wang, Tianyi; Kemao, Qian

2018-05-01

With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
Sputnik: ad hoc distributed computation.

PubMed

Völkel, Gunnar; Lausser, Ludwig; Schmid, Florian; Kraus, Johann M; Kestler, Hans A

2015-04-15

In bioinformatic applications, computationally demanding algorithms are often parallelized to speed up computation. Nevertheless, setting up computational environments for distributed computation is often tedious. Aim of this project were the lightweight ad hoc set up and fault-tolerant computation requiring only a Java runtime, no administrator rights, while utilizing all CPU cores most effectively. The Sputnik framework provides ad hoc distributed computation on the Java Virtual Machine which uses all supplied CPU cores fully. It provides a graphical user interface for deployment setup and a web user interface displaying the current status of current computation jobs. Neither a permanent setup nor administrator privileges are required. We demonstrate the utility of our approach on feature selection of microarray data. The Sputnik framework is available on Github http://github.com/sysbio-bioinf/sputnik under the Eclipse Public License. hkestler@fli-leibniz.de or hans.kestler@uni-ulm.de Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
FPGA-accelerated adaptive optics wavefront control

NASA Astrophysics Data System (ADS)

Mauch, S.; Reger, J.; Reinlein, C.; Appelfelder, M.; Goy, M.; Beckert, E.; Tünnermann, A.

2014-03-01

The speed of real-time adaptive optical systems is primarily restricted by the data processing hardware and computational aspects. Furthermore, the application of mirror layouts with increasing numbers of actuators reduces the bandwidth (speed) of the system and, thus, the number of applicable control algorithms. This burden turns out a key-impediment for deformable mirrors with continuous mirror surface and highly coupled actuator influence functions. In this regard, specialized hardware is necessary for high performance real-time control applications. Our approach to overcome this challenge is an adaptive optics system based on a Shack-Hartmann wavefront sensor (SHWFS) with a CameraLink interface. The data processing is based on a high performance Intel Core i7 Quadcore hard real-time Linux system. Employing a Xilinx Kintex-7 FPGA, an own developed PCie card is outlined in order to accelerate the analysis of a Shack-Hartmann Wavefront Sensor. A recently developed real-time capable spot detection algorithm evaluates the wavefront. The main features of the presented system are the reduction of latency and the acceleration of computation For example, matrix multiplications which in general are of complexity O(n3 are accelerated by using the DSP48 slices of the field-programmable gate array (FPGA) as well as a novel hardware implementation of the SHWFS algorithm. Further benefits are the Streaming SIMD Extensions (SSE) which intensively use the parallelization capability of the processor for further reducing the latency and increasing the bandwidth of the closed-loop. Due to this approach, up to 64 actuators of a deformable mirror can be handled and controlled without noticeable restriction from computational burdens.

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
High Speed All-Optical Data Distribution Network

NASA Astrophysics Data System (ADS)

Braun, Steve; Hodara, Henri

2017-11-01

This article describes the performance and capabilities of an all-optical network featuring low latency, high speed file transfer between serially connected optical nodes. A basic component of the network is a network interface card (NIC) implemented through a unique planar lightwave circuit (PLC) that performs add/drop data and optical signal amplification. The network uses a linear bus topology with nodes in a "T" configuration, as described in the text. The signal is sent optically (hence, no latency) to all nodes via wavelength division multiplexing (WDM), with each node receiver tuned to wavelength of choice via an optical de-multiplexer. Each "T" node routes a portion of the signal to/from the bus through optical couplers, embedded in the network interface card (NIC), to each of the 1 through n computers.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
HPCC Methodologies for Structural Design and Analysis on Parallel and Distributed Computing Platforms

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1998-01-01

In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.
A parallel coordinates style interface for exploratory volume visualization.

PubMed

Tory, Melanie; Potts, Simeon; Möller, Torsten

2005-01-01

We present a user interface, based on parallel coordinates, that facilitates exploration of volume data. By explicitly representing the visualization parameter space, the interface provides an overview of rendering options and enables users to easily explore different parameters. Rendered images are stored in an integrated history bar that facilitates backtracking to previous visualization options. Initial usability testing showed clear agreement between users and experts of various backgrounds (usability, graphic design, volume visualization, and medical physics) that the proposed user interface is a valuable data exploration tool.
Applications considerations in the system design of highly concurrent multiprocessors

NASA Technical Reports Server (NTRS)

Lundstrom, Stephen F.

1987-01-01

A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
Toward Computational Design of High-Efficiency Photovoltaics from First-Principles

DTIC Science & Technology

2016-08-15

dependence of exciton diffusion in conjugated small molecules, Applied Physics Letters, (04 2014): 0. doi: 10.1063/1.4871303 Guangfen Wu, Zi Li, Xu...principle approach based on the time- dependent density functional theory (TDDFT) to describe exciton states, including energy levels and many-body wave... depends more sensitively on the dimension and crystallinity of the acceptor parallel to the interface than normal to the interface. Reorganization
Modular time division multiplexer: Efficient simultaneous characterization of fast and slow transients in multiple samples

NASA Astrophysics Data System (ADS)

Kim, Stephan D.; Luo, Jiajun; Buchholz, D. Bruce; Chang, R. P. H.; Grayson, M.

2016-09-01

A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Modular time division multiplexer: Efficient simultaneous characterization of fast and slow transients in multiple samples.

PubMed

Kim, Stephan D; Luo, Jiajun; Buchholz, D Bruce; Chang, R P H; Grayson, M

2016-09-01

A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Real-Time X-Ray Transmission Microscopy of Solidifying Al-In Alloys

NASA Technical Reports Server (NTRS)

Curreri, Peter A.; Kaukler, William F.

1997-01-01

Real-time observations of transparent analog materials have provided insight, yet the results of these observations are not necessarily representative of opaque metallic systems. In order to study the detailed dynamics of the solidification process, we develop the technologies needed for real-time X ray microscopy of solidifying metallic systems, which has not previously been feasible with the necessary resolution, speed, and contrast. In initial studies of Al-In monotectic alloys unidirectionally solidified in an X-ray transparent furnace, in situ records of the evolution of interface morphologies, interfacial solute accumulation, and formation of the monotectic droplets were obtained for the first time: A radiomicrograph of Al-30In grown during aircraft parabolic maneuvers is presented, showing the volumetric phase distribution in this specimen. The benefits of using X-ray microscopy for postsolidification metallography include ease of specimen preparation, increased sensitivity, and three-dimensional analysis of phase distribution. Imaging of the solute boundary layer revealed that the isoconcentration lines are not parallel (as is often assumed) to the growth interface. Striations in the solidified crystal did not accurately decorate the interface position and shape. The monotectic composition alloy under some conditions grew in an uncoupled manner.
RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

NASA Astrophysics Data System (ADS)

Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

2017-03-01

Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Development of a novel parallel-spool pilot operated high-pressure solenoid valve with high flow rate and high speed

NASA Astrophysics Data System (ADS)

Dong, Dai; Li, Xiaoning

2015-03-01

High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
A similitude method and the corresponding blade design of a low-speed large-scale axial compressor rotor

NASA Astrophysics Data System (ADS)

Yu, Chenghai; Ma, Ning; Wang, Kai; Du, Juan; Van den Braembussche, R. A.; Lin, Feng

2014-04-01

A similitude method to model the tip clearance flow in a high-speed compressor with a low-speed model is presented in this paper. The first step of this method is the derivation of similarity criteria for tip clearance flow, on the basis of an inviscid model of tip clearance flow. The aerodynamic parameters needed for the model design are then obtained from a numerical simulation of the target high-speed compressor rotor. According to the aerodynamic and geometric parameters of the target compressor rotor, a large-scale low-speed rotor blade is designed with an inverse blade design program. In order to validate the similitude method, the features of tip clearance flow in the low-speed model compressor are compared with the ones in the high-speed compressor at both design and small flow rate points. It is found that not only the trajectory of the tip leakage vortex but also the interface between the tip leakage flow and the incoming main flow in the high-speed compressor match well with that of its low speed model. These results validate the effectiveness of the similitude method for the tip clearance flow proposed in this paper.
Improving aircraft conceptual design - A PHIGS interactive graphics interface for ACSYNT

NASA Technical Reports Server (NTRS)

Wampler, S. G.; Myklebust, A.; Jayaram, S.; Gelhausen, P.

1988-01-01

A CAD interface has been created for the 'ACSYNT' aircraft conceptual design code that permits the execution and control of the design process via interactive graphics menus. This CAD interface was coded entirely with the new three-dimensional graphics standard, the Programmer's Hierarchical Interactive Graphics System. The CAD/ACSYNT system is designed for use by state-of-the-art high-speed imaging work stations. Attention is given to the approaches employed in modeling, data storage, and rendering.
Essential slow degrees of freedom in protein-surface simulations: A metadynamics investigation.

PubMed

Prakash, Arushi; Sprenger, K G; Pfaendtner, Jim

2018-03-29

Many proteins exhibit strong binding affinities to surfaces, with binding energies much greater than thermal fluctuations. When modelling these protein-surface systems with classical molecular dynamics (MD) simulations, the large forces that exist at the protein/surface interface generally confine the system to a single free energy minimum. Exploring the full conformational space of the protein, especially finding other stable structures, becomes prohibitively expensive. Coupling MD simulations with metadynamics (enhanced sampling) has fast become a common method for sampling the adsorption of such proteins. In this paper, we compare three different flavors of metadynamics, specifically well-tempered, parallel-bias, and parallel-tempering in the well-tempered ensemble, to exhaustively sample the conformational surface-binding landscape of model peptide GGKGG. We investigate the effect of mobile ions and ion charge, as well as the choice of collective variable (CV), on the binding free energy of the peptide. We make the case for explicitly biasing ions to sample the true binding free energy of biomolecules when the ion concentration is high and the binding free energies of the solute and ions are similar. We also make the case for choosing CVs that apply bias to all atoms of the solute to speed up calculations and obtain the maximum possible amount of information about the system. Copyright © 2017 Elsevier Inc. All rights reserved.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
Emissions of Transport Refrigeration Units with CARB Diesel, Gas-to-Liquid Diesel, and Emissions Control Devices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnitt, R. A.; Chernich, D.; Burnitzki, M.

2010-05-01

A novel in situ method was used to measure emissions and fuel consumption of transport refrigeration units (TRUs). The test matrix included two fuels, two exhaust configurations, and two TRU engine operating speeds. Test fuels were California ultra low sulfur diesel and gas-to-liquid (GTL) diesel. Exhaust configurations were a stock muffler and a Thermo King pDPF diesel particulate filter. The TRU engine operating speeds were high and low, controlled by the TRU user interface. Results indicate that GTL diesel fuel reduces all regulated emissions at high and low engine speeds. Application of a Thermo King pDPF reduced regulated emissions, sometimesmore » almost entirely. The application of both GTL diesel and a Thermo King pDPF reduced regulated emissions at high engine speed, but showed an increase in oxides of nitrogen at low engine speed.« less
Initiation of and distributed deformation at and around stylolite interfaces: Insights from detailed microstructural analysis

NASA Astrophysics Data System (ADS)

Ebner, M.; Piazolo, S.; Koehn, D.

2009-04-01

In the present contribution we investigate the microstructure of bedding parallel and bedding normal stylolites in carbonate rocks. We focused our study on micro-stylolites which represent an initial stage of this localised pressure solution process as stylolite roughness amplitude is a function of strain. We use electron backscatter diffraction analysis (EBSD) and orientation contrast imaging to address the following issues: (i) What causes the initiation of stylolite interfaces at a submicroscopic scale, (ii) is there distributed deformation around the stylolite interface and (iii) what is the role of the interface (residuum)? Our findings demonstrate that the characteristic stylolite teeth are initiated at a pre-existing heterogeneity in the host-rock. This quenched noise in carbonate rocks is typically composed of clay particles in the submicron scale. In addition, qtz-grains are present along especially pronounced stylolite peaks. The stylolite interface evolves with increasing strain from individual clay particles separated by grain-grain contacts of calcite along the interface to a continuous layer of clay and oxides. Thickness variation of the residuum along the interface is inferred to be strongly influenced by the pre-existing distribution of pinning particles that are more resistant to dissolution. Another important observation is that a shaped preferred orientation (SPO) exists in a halo around the stylolite. This SPO increases with proximity to the stylolite interface. Within this halo, crystal plastic deformation is expressed by subgrain formation with subgrain boundaries usually aligned parallel to shortening direction. Bedding normal (tectonic) stylolites which overprint already compacted beds i.e. with a pre-existing sedimentary SPO parallel to the bedding plane exhibit a SPO at a high angle to the sedimentary SPO. We conclude that stylolite roughness is primarily caused by pre-existing heterogeneities in the host-rock which are more resistant to dissolution e.g. clay particles and/or qtz grains. Secondly, we demonstrate that stylolite formation is not a process that is restricted to the stylolite interface itself but a process that is active in a broader zone around the actual interface.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.

Random number generators for large-scale parallel Monte Carlo simulations on FPGA

NASA Astrophysics Data System (ADS)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Protein denaturants at aqueous-hydrophobic interfaces: self-consistent correlation between induced interfacial fluctuations and denaturant stability at the interface.

PubMed

Cui, Di; Ou, Shu-Ching; Patel, Sandeep

2015-01-08

The notion of direct interaction between denaturing cosolvent and protein residues has been proposed in dialogue relevant to molecular mechanisms of protein denaturation. Here we consider the correlation between free energetic stability and induced fluctuations of an aqueous-hydrophobic interface between a model hydrophobically associating protein, HFBII, and two common protein denaturants, guanidinium cation (Gdm(+)) and urea. We compute potentials of mean force along an order parameter that brings the solute molecule close to the known hydrophobic region of the protein. We assess potentials of mean force for different relative orientations between the protein and denaturant molecule. We find that in both cases of guanidinium cation and urea relative orientations of the denaturant molecule that are parallel to the local protein-water interface exhibit greater stability compared to edge-on or perpendicular orientations. This behavior has been observed for guanidinium/methylguanidinium cations at the liquid-vapor interface of water, and thus the present results further corroborate earlier findings. Further analysis of the induced fluctuations of the aqueous-hydrophobic interface upon approach of the denaturant molecule indicates that the parallel orientation, displaying a greater stability at the interface, also induces larger fluctuations of the interface compared to the perpendicular orientations. The correlation of interfacial stability and induced interface fluctuation is a recurring theme for interface-stable solutes at hydrophobic interfaces. Moreover, observed correlations between interface stability and induced fluctuations recapitulate connections to local hydration structure and patterns around solutes as evidenced by experiment (Cooper et al., J. Phys. Chem. A 2014, 118, 5657.) and high-level ab initio/DFT calculations (Baer et al., Faraday Discuss 2013, 160, 89).
Protein Denaturants at Aqueous–Hydrophobic Interfaces: Self-Consistent Correlation between Induced Interfacial Fluctuations and Denaturant Stability at the Interface

PubMed Central

2015-01-01

The notion of direct interaction between denaturing cosolvent and protein residues has been proposed in dialogue relevant to molecular mechanisms of protein denaturation. Here we consider the correlation between free energetic stability and induced fluctuations of an aqueous–hydrophobic interface between a model hydrophobically associating protein, HFBII, and two common protein denaturants, guanidinium cation (Gdm+) and urea. We compute potentials of mean force along an order parameter that brings the solute molecule close to the known hydrophobic region of the protein. We assess potentials of mean force for different relative orientations between the protein and denaturant molecule. We find that in both cases of guanidinium cation and urea relative orientations of the denaturant molecule that are parallel to the local protein–water interface exhibit greater stability compared to edge-on or perpendicular orientations. This behavior has been observed for guanidinium/methylguanidinium cations at the liquid–vapor interface of water, and thus the present results further corroborate earlier findings. Further analysis of the induced fluctuations of the aqueous–hydrophobic interface upon approach of the denaturant molecule indicates that the parallel orientation, displaying a greater stability at the interface, also induces larger fluctuations of the interface compared to the perpendicular orientations. The correlation of interfacial stability and induced interface fluctuation is a recurring theme for interface-stable solutes at hydrophobic interfaces. Moreover, observed correlations between interface stability and induced fluctuations recapitulate connections to local hydration structure and patterns around solutes as evidenced by experiment (Cooper et al., J. Phys. Chem. A2014, 118, 5657.) and high-level ab initio/DFT calculations (Baer et al., Faraday Discuss2013, 160, 89). PMID:25536388
DOE Office of Scientific and Technical Information (OSTI.GOV)

Welcome, Michael L.; Bell, Christian S.

GASNet (Global-Address Space Networking) is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages such as UPC and Titanium. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet is designed specifically to support high-performance, portable implementations of global address space languages on modern high-end communication networks. The interface provides the flexibility and extensibility required to express a wide variety of communication patterns without sacrificing performancemore » by imposing large computational overheads in the interface. The design of the GASNet interface is partitioned into two layers to maximize porting ease without sacrificing performance: the lower level is a narrow but very general interface called the GASNet core API - the design is basedheavily on Active Messages, and is implemented directly on top of each individual network architecture. The upper level is a wider and more expressive interface called GASNet extended API, which provides high-level operations such as remote memory access and various collective operations. This release implements GASNet over MPI, the Quadrics "elan" API, the Myrinet "GM" API and the "LAPI" interface to the IBM SP switch. A template is provided for adding support for additional network interfaces.« less
Optical methods in fault dynamics

NASA Astrophysics Data System (ADS)

Uenishi, K.; Rossmanith, H. P.

2003-10-01

The Rayleigh pulse interaction with a pre-stressed, partially contacting interface between similar and dissimilar materials is investigated experimentally as well as numerically. This study is intended to obtain an improved understanding of the interface (fault) dynamics during the earthquake rupture process. Using dynamic photoelasticity in conjunction with high-speed cinematography, snapshots of time-dependent isochromatic fringe patterns associated with Rayleigh pulse-interface interaction are experimentally recorded. It is shown that interface slip (instability) can be triggered dynamically by a pulse which propagates along the interface at the Rayleigh wave speed. For the numerical investigation, the finite difference wave simulator SWIFD is used for solving the problem under different combinations of contacting materials. The effect of acoustic impedance ratio of the two contacting materials on the wave patterns is discussed. The results indicate that upon interface rupture, Mach (head) waves, which carry a relatively large amount of energy in a concentrated form, can be generated and propagated from the interface contact region (asperity) into the acoustically softer material. Such Mach waves can cause severe damage onto a particular region inside an adjacent acoustically softer area. This type of damage concentration might be a possible reason for the generation of the "damage belt" in Kobe, Japan, on the occasion of the 1995 Hyogo-ken Nanbu (Kobe) Earthquake.
A Parallel Numerical Micromagnetic Code Using FEniCS

NASA Astrophysics Data System (ADS)

Nagy, L.; Williams, W.; Mitchell, L.

2013-12-01

Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.
Full range line-field parallel swept source imaging utilizing digital refocusing

NASA Astrophysics Data System (ADS)

Fechtig, Daniel J.; Kumar, Abhishek; Drexler, Wolfgang; Leitgeb, Rainer A.

2015-12-01

We present geometric optics-based refocusing applied to a novel off-axis line-field parallel swept source imaging (LPSI) system. LPSI is an imaging modality based on line-field swept source optical coherence tomography, which permits 3-D imaging at acquisition speeds of up to 1 MHz. The digital refocusing algorithm applies a defocus-correcting phase term to the Fourier representation of complex-valued interferometric image data, which is based on the geometrical optics information of the LPSI system. We introduce the off-axis LPSI system configuration, the digital refocusing algorithm and demonstrate the effectiveness of our method for refocusing volumetric images of technical and biological samples. An increase of effective in-focus depth range from 255 μm to 4.7 mm is achieved. The recovery of the full in-focus depth range might be especially valuable for future high-speed and high-resolution diagnostic applications of LPSI in ophthalmology.
Designing intuitive dialog boxes in Windows environments

NASA Astrophysics Data System (ADS)

Souetova, Natalia

2000-01-01

There were analyzed some approaches to user interface design. Most existing interfaces seem to be difficult for understanding and studying for newcomers. There were defined some ways for designing interfaces based on psychology of computer image perception and experience got while working with artists and designers without special technique education. Some applications with standard Windows interfaces, based on these results, were developed. Windows environment was chosen because they are very popular now. This increased quality and speed of users' job and reduced quantity of troubles and mistakes. Now high-qualified employers do not spend their working time for explanation and help.
Performance of a 300 Mbps 1:16 serial/parallel optoelectronic receiver module

NASA Technical Reports Server (NTRS)

Richard, M. A.; Claspy, P. C.; Bhasin, K. B.; Bendett, M. B.

1990-01-01

Optical interconnects are being considered for the high speed distribution of multiplexed control signals in GaAs monolithic microwave integrated circuit (MMIC) based phased array antennas. The performance of a hybrid GaAs optoelectronic integrated circuit (OEIC) is described, as well as its design and fabrication. The OEIC converts a 16-bit serial optical input to a 16 parallel line electrical output using an on-board 1:16 demultiplexer and operates at data rates as high as 30b Mbps. The performance characteristics and potential applications of the device are presented.
Parallel multiscale simulations of a brain aneurysm

PubMed Central

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm.

PubMed

Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.« less
Human-Computer Interface Controlled by Horizontal Directional Eye Movements and Voluntary Blinks Using AC EOG Signals

NASA Astrophysics Data System (ADS)

Kajiwara, Yusuke; Murata, Hiroaki; Kimura, Haruhiko; Abe, Koji

As a communication support tool for cases of amyotrophic lateral sclerosis (ALS), researches on eye gaze human-computer interfaces have been active. However, since voluntary and involuntary eye movements cannot be distinguished in the interfaces, their performance is still not sufficient for practical use. This paper presents a high performance human-computer interface system which unites high quality recognitions of horizontal directional eye movements and voluntary blinks. The experimental results have shown that the number of incorrect inputs is decreased by 35.1% in an existing system which equips recognitions of horizontal and vertical directional eye movements in addition to voluntary blinks and character inputs are speeded up by 17.4% from the existing system.
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.

PubMed

Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J

2013-12-06

Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.
Performance of gigabit FDDI

NASA Technical Reports Server (NTRS)

Game, David; Maly, Kurt J.

1990-01-01

Great interest exists in developing high speed protocols which will be able to support data rates at gigabit speeds. Hardware currently exists which can experimentally transmit at data rates exceeding a gigabit per second, but it is not clear as to what types of protocols will provide the best performance. One possibility is to examine current protocols and their extensibility to these speeds. Scaling of Fiber Distributed Data Interface (FDDI) to gigabit speeds is studied. More specifically, delay statistics are included to provide insight as to which parameters (network length, packet length or number of nodes) have the greatest effect on performance.
Performance comparison analysis library communication cluster system using merge sort

NASA Astrophysics Data System (ADS)

Wulandari, D. A. R.; Ramadhan, M. E.

2018-04-01

Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
Using the cloud to speed-up calibration of watershed-scale hydrologic models (Invited)

NASA Astrophysics Data System (ADS)

Goodall, J. L.; Ercan, M. B.; Castronova, A. M.; Humphrey, M.; Beekwilder, N.; Steele, J.; Kim, I.

2013-12-01

This research focuses on using the cloud to address computational challenges associated with hydrologic modeling. One example is calibration of a watershed-scale hydrologic model, which can take days of execution time on typical computers. While parallel algorithms for model calibration exist and some researchers have used multi-core computers or clusters to run these algorithms, these solutions do not fully address the challenge because (i) calibration can still be too time consuming even on multicore personal computers and (ii) few in the community have the time and expertise needed to manage a compute cluster. Given this, another option for addressing this challenge that we are exploring through this work is the use of the cloud for speeding-up calibration of watershed-scale hydrologic models. The cloud used in this capacity provides a means for renting a specific number and type of machines for only the time needed to perform a calibration model run. The cloud allows one to precisely balance the duration of the calibration with the financial costs so that, if the budget allows, the calibration can be performed more quickly by renting more machines. Focusing specifically on the SWAT hydrologic model and a parallel version of the DDS calibration algorithm, we show significant speed-up time across a range of watershed sizes using up to 256 cores to perform a model calibration. The tool provides a simple web-based user interface and the ability to monitor the calibration job submission process during the calibration process. Finally this talk concludes with initial work to leverage the cloud for other tasks associated with hydrologic modeling including tasks related to preparing inputs for constructing place-based hydrologic models.
Current implementation and future plans on new code architecture, programming language and user interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brun, B.

1997-07-01

Computer technology has improved tremendously during the last years with larger media capacity, memory and more computational power. Visual computing with high-performance graphic interface and desktop computational power have changed the way engineers accomplish everyday tasks, development and safety studies analysis. The emergence of parallel computing will permit simulation over a larger domain. In addition, new development methods, languages and tools have appeared in the last several years.
Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

Hannauer, George

1991-01-01

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.

Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)

NASA Technical Reports Server (NTRS)

Dalton, Shelly D.; Daley, Philip C.

1988-01-01

As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
A parallel implementation of a multisensor feature-based range-estimation method

NASA Technical Reports Server (NTRS)

Suorsa, Raymond E.; Sridhar, Banavar

1993-01-01

There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer.
Full-Scale Measurement and Prediction of the Dynamics of High-Speed Helicopter Tow Cables

DTIC Science & Technology

2014-02-14

fairing at tow speeds up to 17 knots. The technique for measuring vibration amplitudes along the cable is based on fiber Bragg grating ( FBG ) sensors...cm long. As light propagates through a FBG , it is partially reflected at each interface between the bands of high and low refractive index. If the...slightly, which can be measured by a change in the Bragg wavelength. State-of-the-art FBG interrogators can resolve Bragg wavelength shifts down to 0.001 nm
Stokesian swimming of a helical swimmer across an interface

NASA Astrophysics Data System (ADS)

Godinez, Francisco; Ramos, Armando; Zenit, Roberto

2016-11-01

Microorganisms swim in flows dominated by viscous effects but in many instances the motion occurs across heterogeneous environments where the fluid properties may vary. To our knowledge, the effect of such in-homogeneity has not been addressed in depth. We conduct experiments in which a magnetic self-propelled helical swimmer displaces across the interface between two immiscible density stratified fluids. As the swimmer crosses the interface, at a fixed rotation rate, its speed is reduced and a certain volume of the lower fluid is dragged across. We quantify the drift volume and the change of swimming speed for different swimming speeds and different fluid combinations. We relate the reduction of the swimming speed with the interfacial tension of the interface. We also compare the measurements of the drift volume with some recent calculations found in the literature.
The Solar Wind and Geomagnetic Activity as a Function of Time Relative to Corotating Interaction Regions

NASA Technical Reports Server (NTRS)

McPherron, Robert L.; Weygand, James

2006-01-01

Corotating interaction regions during the declining phase of the solar cycle are the cause of recurrent geomagnetic storms and are responsible for the generation of high fluxes of relativistic electrons. These regions are produced by the collision of a high-speed stream of solar wind with a slow-speed stream. The interface between the two streams is easily identified with plasma and field data from a solar wind monitor upstream of the Earth. The properties of the solar wind and interplanetary magnetic field are systematic functions of time relative to the stream interface. Consequently the coupling of the solar wind to the Earth's magnetosphere produces a predictable sequence of events. Because the streams persist for many solar rotations it should be possible to use terrestrial observations of past magnetic activity to predict future activity. Also the high-speed streams are produced by large unipolar magnetic regions on the Sun so that empirical models can be used to predict the velocity profile of a stream expected at the Earth. In either case knowledge of the statistical properties of the solar wind and geomagnetic activity as a function of time relative to a stream interface provides the basis for medium term forecasting of geomagnetic activity. In this report we use lists of stream interfaces identified in solar wind data during the years 1995 and 2004 to develop probability distribution functions for a variety of different variables as a function of time relative to the interface. The results are presented as temporal profiles of the quartiles of the cumulative probability distributions of these variables. We demonstrate that the storms produced by these interaction regions are generally very weak. Despite this the fluxes of relativistic electrons produced during those storms are the highest seen in the solar cycle. We attribute this to the specific sequence of events produced by the organization of the solar wind relative to the stream interfaces. We also show that there are large quantitative differences in various parameters between the two cycles.
Parallel optoelectronic trinary signed-digit division

NASA Astrophysics Data System (ADS)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mercier, C.W.

The Network File System (NFS) will be the user interface to a High-Performance Data System (HPDS) being developed at Los Alamos National Laboratory (LANL). HPDS will manage high-capacity, high-performance storage systems connected directly to a high-speed network from distributed workstations. NFS will be modified to maximize performance and to manage massive amounts of data. 6 refs., 3 figs.
ARC-2007-ACD07-0184-003

NASA Image and Video Library

2007-09-26

From left: Data Parallel Line Relaxation (DPLR) software team members Kerry Trumble, Deepak Bose and David Hash analyze and predict the extreme environments NASA's space shuttle experiences during its super high-speed reentry into Earthâ€™s atmosphere.
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Lanteri, S.; Gumaste, U.; Ronaghi, M.

1994-01-01

Applications are described of high-performance parallel, computation for the analysis of complete jet engines, considering its multi-discipline coupled problem. The coupled problem involves interaction of structures with gas dynamics, heat conduction and heat transfer in aircraft engines. The methodology issues addressed include: consistent discrete formulation of coupled problems with emphasis on coupling phenomena; effect of partitioning strategies, augmentation and temporal solution procedures; sensitivity of response to problem parameters; and methods for interfacing multiscale discretizations in different single fields. The computer implementation issues addressed include: parallel treatment of coupled systems; domain decomposition and mesh partitioning strategies; data representation in object-oriented form and mapping to hardware driven representation, and tradeoff studies between partitioning schemes and fully coupled treatment.
Holographic memory for high-density data storage and high-speed pattern recognition

NASA Astrophysics Data System (ADS)

Gu, Claire

2002-09-01

As computers and the internet become faster and faster, more and more information is transmitted, received, and stored everyday. The demand for high density and fast access time data storage is pushing scientists and engineers to explore all possible approaches including magnetic, mechanical, optical, etc. Optical data storage has already demonstrated its potential in the competition against other storage technologies. CD and DVD are showing their advantages in the computer and entertainment market. What motivated the use of optical waves to store and access information is the same as the motivation for optical communication. Light or an optical wave has an enormous capacity (or bandwidth) to carry information because of its short wavelength and parallel nature. In optical storage, there are two types of mechanism, namely localized and holographic memories. What gives the holographic data storage an advantage over localized bit storage is the natural ability to read the stored information in parallel, therefore, meeting the demand for fast access. Another unique feature that makes the holographic data storage attractive is that it is capable of performing associative recall at an incomparable speed. Therefore, volume holographic memory is particularly suitable for high-density data storage and high-speed pattern recognition. In this paper, we review previous works on volume holographic memories and discuss the challenges for this technology to become a reality.
Highly accelerated cardiovascular MR imaging using many channel technology: concepts and clinical applications

PubMed Central

Sodickson, Daniel K.

2010-01-01

Cardiovascular magnetic resonance imaging (CVMRI) is of proven clinical value in the non-invasive imaging of cardiovascular diseases. CVMRI requires rapid image acquisition, but acquisition speed is fundamentally limited in conventional MRI. Parallel imaging provides a means for increasing acquisition speed and efficiency. However, signal-to-noise (SNR) limitations and the limited number of receiver channels available on most MR systems have in the past imposed practical constraints, which dictated the use of moderate accelerations in CVMRI. High levels of acceleration, which were unattainable previously, have become possible with many-receiver MR systems and many-element, cardiac-optimized RF-coil arrays. The resulting imaging speed improvements can be exploited in a number of ways, ranging from enhancement of spatial and temporal resolution to efficient whole heart coverage to streamlining of CVMRI work flow. In this review, examples of these strategies are provided, following an outline of the fundamentals of the highly accelerated imaging approaches employed in CVMRI. Topics discussed include basic principles of parallel imaging; key requirements for MR systems and RF-coil design; practical considerations of SNR management, supported by multi-dimensional accelerations, 3D noise averaging and high field imaging; highly accelerated clinical state-of-the art cardiovascular imaging applications spanning the range from SNR-rich to SNR-limited; and current trends and future directions. PMID:17562047
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
New Laboratory Observations of Thermal Pressurization Weakening

NASA Astrophysics Data System (ADS)

Badt, N.; Tullis, T. E.; Hirth, G.

2017-12-01

Dynamic frictional weakening due to pore fluid thermal pressurization has been studied under elevated confining pressure in the laboratory, using a rotary-shear apparatus having a sample with independent pore pressure and confining pressure systems. Thermal pressurization is directly controlled by the permeability of the rocks, not only for the initiation of high-speed frictional weakening but also for a subsequent sequence of high-speed sliding events. First, the permeability is evaluated at different effective pressures using a method where the pore pressure drop and the flow-through rate are compared using Darcy's Law as well as a pore fluid oscillation method, the latter method also permitting measurement of the storage capacity. Then, the samples undergo a series of high-speed frictional sliding segments at a velocity of 2.5 mm/s, under an applied confining pressure and normal stress of 45 MPa and 50 MPa, respectively, and an initial pore pressure of 25 MPa. Finally the rock permeability and storage capacity are measured again to assess the evolution of the rock's pore fluid properties. For samples with a permeability of 10-20 m2 thermal pressurization promotes a 40% decrease in strength. However, after a sequence of three high-speed sliding events, the magnitude of weakening diminishes progressively from 40% to 15%. The weakening events coincide with dilation of the sliding interface. Moreover, the decrease in the weakening degree with progressive fast-slip events suggest that the hydraulic diffusivity may increase locally near the sliding interface during thermal pressurization-enhanced slip. This could result from stress- or thermally-induced damage to the host rock, which would perhaps increase both permeability and storage capacity, and so possibly decrease the susceptibility of dynamic weakening due to thermal pressurization in subsequent high-speed sliding events.
Pushbroom Stereo for High-Speed Navigation in Cluttered Environments

DTIC Science & Technology

2014-09-01

inertial measurement sensors such as Achtelik et al .’s implemention of PTAM (parallel tracking and mapping) [15] with a barometric altimeter, stable flights...in indoor and outdoor environments are possible [1]. With a full vison- aided inertial navigation system (VINS), Li et al . have shown remarkable...avoidance on small UAVs. Stereo systems suffer from a similar speed issue, with most modern systems running at or below 30 Hz [8], [27]. Honegger et
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research

DTIC Science & Technology

2016-11-13

these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
Interfacial Dynamics of Condensing Vapor Bubbles in an Ultrasonic Acoustic Field

NASA Astrophysics Data System (ADS)

Boziuk, Thomas; Smith, Marc; Glezer, Ari

2016-11-01

Enhancement of vapor condensation in quiescent subcooled liquid using ultrasonic actuation is investigated experimentally. The vapor bubbles are formed by direct injection from a pressurized steam reservoir through nozzles of varying characteristic diameters, and are advected within an acoustic field of programmable intensity. While kHz-range acoustic actuation typically couples to capillary instability of the vapor-liquid interface, ultrasonic (MHz-range) actuation leads to the formation of a liquid spout that penetrates into the vapor bubble and significantly increases its surface area and therefore condensation rate. Focusing of the ultrasonic beam along the spout leads to ejection of small-scale droplets from that are propelled towards the vapor liquid interface and result in localized acceleration of the condensation. High-speed video of Schlieren images is used to investigate the effects of the ultrasonic actuation on the thermal boundary layer on the liquid side of the vapor-liquid interface and its effect on the condensation rate, and the liquid motion during condensation is investigated using high-magnification PIV measurements. High-speed image processing is used to assess the effect of the actuation on the dynamics and temporal variation in characteristic scale (and condensation rate) of the vapor bubbles.
Parallel Solver for Diffuse Optical Tomography on Realistic Head Models With Scattering and Clear Regions.

PubMed

Placati, Silvio; Guermandi, Marco; Samore, Andrea; Scarselli, Eleonora Franchi; Guerrieri, Roberto

2016-09-01

Diffuse optical tomography is an imaging technique, based on evaluation of how light propagates within the human head to obtain the functional information about the brain. Precision in reconstructing such an optical properties map is highly affected by the accuracy of the light propagation model implemented, which needs to take into account the presence of clear and scattering tissues. We present a numerical solver based on the radiosity-diffusion model, integrating the anatomical information provided by a structural MRI. The solver is designed to run on parallel heterogeneous platforms based on multiple GPUs and CPUs. We demonstrate how the solver provides a 7 times speed-up over an isotropic-scattered parallel Monte Carlo engine based on a radiative transport equation for a domain composed of 2 million voxels, along with a significant improvement in accuracy. The speed-up greatly increases for larger domains, allowing us to compute the light distribution of a full human head ( ≈ 3 million voxels) in 116 s for the platform used.
Precision Parameter Estimation and Machine Learning

NASA Astrophysics Data System (ADS)

Wandelt, Benjamin D.

2008-12-01

I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
Numerical investigation of electromagnetic pulse welded interfaces between dissimilar metals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Wei; Sun, Xin

Electromagnetic pulse welding (EMPW), an innovative high-speed joining technique, is a potential method for the automotive industry in joining and assembly of dissimilar lightweight metals with drastically different melting temperatures and other thermal physical properties, such as thermal conductivity and thermal expansion coefficients. The weld quality of EMPW is significantly affected by a variety of interacting physical phenomena including large plastic deformation, materials mixing, localized heating and rapid cooling, possible localized melting and subsequent diffusion and solidification, micro-cracking and void, etc. In the present study, a thermo-mechanically coupled dynamic model has been developed to quantitatively resolve the high-speed impact joiningmore » interface characteristics as well as the process-induced interface temperature evolution, defect formation and possible microstructural composition variation. Reasonably good agreement has been obtained between the predicted results and experimental measurements in terms of interfacial morphology characteristics. The modeling framework is expected to provide further understanding of the hierarchical interfacial features of the non-equilibrium material joining process and weld formation mechanisms involved in the EMPW operation, thus accelerating future development and deployment of this advanced joining technology.« less

Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
Dynamics modeling for parallel haptic interfaces with force sensing and control.

PubMed

Bernstein, Nicholas; Lawrence, Dale; Pao, Lucy

2013-01-01

Closed-loop force control can be used on haptic interfaces (HIs) to mitigate the effects of mechanism dynamics. A single multidimensional force-torque sensor is often employed to measure the interaction force between the haptic device and the user's hand. The parallel haptic interface at the University of Colorado (CU) instead employs smaller 1D force sensors oriented along each of the five actuating rods to build up a 5D force vector. This paper shows that a particular manipulandum/hand partition in the system dynamics is induced by the placement and type of force sensing, and discusses the implications on force and impedance control for parallel haptic interfaces. The details of a "squaring down" process are also discussed, showing how to obtain reduced degree-of-freedom models from the general six degree-of-freedom dynamics formulation.
GSRP/David Marshall: Fully Automated Cartesian Grid CFD Application for MDO in High Speed Flows

NASA Technical Reports Server (NTRS)

2003-01-01

With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.
The structure of the electron diffusion region during asymmetric anti-parallel magnetic reconnection

NASA Astrophysics Data System (ADS)

Swisdak, M.; Drake, J. F.; Price, L.; Burch, J. L.; Cassak, P.

2017-12-01

The structure of the electron diffusion region during asymmetric magnetic reconnection is ex- plored with high-resolution particle-in-cell simulations that focus on an magnetopause event ob- served by the Magnetospheric Multiscale Mission (MMS). A major surprise is the development of a standing, oblique whistler-like structure with regions of intense positive and negative dissipation. This structure arises from high-speed electrons that flow along the magnetosheath magnetic sepa- ratrices, converge in the dissipation region and jet across the x-line into the magnetosphere. The jet produces a region of negative charge and generates intense parallel electric fields that eject the electrons downstream along the magnetospheric separatrices. The ejected electrons produce the parallel velocity-space crescents documented by MMS.
Embedded system of image storage based on fiber channel

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Su, Wanxin; Xing, Zhongbao; Wang, Hualong

2008-03-01

In domains of aerospace, aviation, aiming, and optic measure etc., the embedded system of imaging, processing and recording is absolutely necessary, which has small volume, high processing speed and high resolution. But the embedded storage technology becomes system bottleneck because of developing slowly. It is used to use RAID to promote storage speed, but it is unsuitable for the embedded system because of its big volume. Fiber channel (FC) technology offers a new method to develop the high-speed, portable storage system. In order to make storage subsystem meet the needs of high storage rate, make use of powerful Virtex-4 FPGA and high speed fiber channel, advance a project of embedded system of digital image storage based on Xilinx Fiber Channel Arbitrated Loop LogiCORE. This project utilizes Virtex- 4 RocketIO MGT transceivers to transmit the data serially, and connects many Fiber Channel hard drivers by using of Arbitrated Loop optionally. It can achieve 400MBps storage rate, breaks through the bottleneck of PCI interface, and has excellences of high-speed, real-time, portable and massive capacity.
Tribology of Si/SiO2 in humid air: transition from severe chemical wear to wearless behavior at nanoscale.

PubMed

Chen, Lei; He, Hongtu; Wang, Xiaodong; Kim, Seong H; Qian, Linmao

2015-01-13

Wear at sliding interfaces of silicon is a main cause for material loss in nanomanufacturing and device failure in microelectromechanical system (MEMS) applications. However, a comprehensive understanding of the nanoscale wear mechanisms of silicon in ambient conditions is still lacking. Here, we report the chemical wear of single crystalline silicon, a material used for micro/nanoscale devices, in humid air under the contact pressure lower than the material hardness. A transmission electron microscopy (TEM) analysis of the wear track confirmed that the wear of silicon in humid conditions originates from surface reactions without significant subsurface damages such as plastic deformation or fracture. When rubbed with a SiO2 ball, the single crystalline silicon surface exhibited transitions from severe wear in intermediate humidity to nearly wearless states at two opposite extremes: (a) low humidity and high sliding speed conditions and (b) high humidity and low speed conditions. These transitions suggested that at the sliding interfaces of Si/SiO2 at least two different tribochemical reactions play important roles. One would be the formation of a strong "hydrogen bonding bridge" between hydroxyl groups of two sliding interfaces and the other the removal of hydroxyl groups from the SiO2 surface. The experimental data indicated that the dominance of each reaction varies with the ambient humidity and sliding speed.
Low-power grating detection system chip for high-speed low-cost length and angle precision measurement

NASA Astrophysics Data System (ADS)

Hou, Ligang; Luo, Rengui; Wu, Wuchen

2006-11-01

This paper forwards a low power grating detection chip (EYAS) on length and angle precision measurement. Traditional grating detection method, such as resister chain divide or phase locked divide circuit are difficult to design and tune. The need of an additional CPU for control and display makes these methods' implementation more complex and costly. Traditional methods also suffer low sampling speed for the complex divide circuit scheme and CPU software compensation. EYAS is an application specific integrated circuit (ASIC). It integrates micro controller unit (MCU), power management unit (PMU), LCD controller, Keyboard interface, grating detection unit and other peripherals. Working at 10MHz, EYAS can afford 5MHz internal sampling rate and can handle 1.25MHz orthogonal signal from grating sensor. With a simple control interface by keyboard, sensor parameter, data processing and system working mode can be configured. Two LCD controllers can adapt to dot array LCD or segment bit LCD, which comprised output interface. PMU alters system between working and standby mode by clock gating technique to save power. EYAS in test mode (system action are more frequently than real world use) consumes 0.9mw, while 0.2mw in real world use. EYAS achieved the whole grating detection system function, high-speed orthogonal signal handling in a single chip with very low power consumption.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Estimation of vibration frequency of loudspeaker diaphragm by parallel phase-shifting digital holography

NASA Astrophysics Data System (ADS)

Kakue, T.; Endo, Y.; Shimobaba, T.; Ito, T.

2014-11-01

We report frequency estimation of loudspeaker diaphragm vibrating at high speed by parallel phase-shifting digital holography which is a technique of single-shot phase-shifting interferometry. This technique records multiple phaseshifted holograms required for phase-shifting interferometry by using space-division multiplexing. We constructed a parallel phase-shifting digital holography system consisting of a high-speed polarization-imaging camera. This camera has a micro-polarizer array which selects four linear polarization axes for 2 × 2 pixels. We set a loudspeaker as an object, and recorded vibration of diaphragm of the loudspeaker by the constructed system. By the constructed system, we demonstrated observation of vibration displacement of loudspeaker diaphragm. In this paper, we aim to estimate vibration frequency of the loudspeaker diaphragm by applying the experimental results to frequency analysis. Holograms consisting of 128 × 128 pixels were recorded at a frame rate of 262,500 frames per second by the camera. A sinusoidal wave was input to the loudspeaker via a phone connector. We observed displacement of the loudspeaker diaphragm vibrating by the system. We also succeeded in estimating vibration frequency of the loudspeaker diaphragm by applying frequency analysis to the experimental results.
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
A Comparative Propulsion System Analysis for the High-Speed Civil Transport

NASA Technical Reports Server (NTRS)

Berton, Jeffrey J.; Haller, William J.; Senick, Paul F.; Jones, Scott M.; Seidel, Jonathan A.

2005-01-01

Six of the candidate propulsion systems for the High-Speed Civil Transport are the turbojet, turbine bypass engine, mixed flow turbofan, variable cycle engine, Flade engine, and the inverting flow valve engine. A comparison of these propulsion systems by NASA's Glenn Research Center, paralleling studies within the aircraft industry, is presented. This report describes the Glenn Aeropropulsion Analysis Office's contribution to the High-Speed Research Program's 1993 and 1994 propulsion system selections. A parametric investigation of each propulsion cycle's primary design variables is analytically performed. Performance, weight, and geometric data are calculated for each engine. The resulting engines are then evaluated on two airframer-derived supersonic commercial aircraft for a 5000 nautical mile, Mach 2.4 cruise design mission. The effects of takeoff noise, cruise emissions, and cycle design rules are examined.
(abstract) A High Throughput 3-D Inner Product Processor

NASA Technical Reports Server (NTRS)

Daud, Tuan

1996-01-01

A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
On Multiple AER Handshaking Channels Over High-Speed Bit-Serial Bidirectional LVDS Links With Flow-Control and Clock-Correction on Commercial FPGAs for Scalable Neuromorphic Systems.

PubMed

Yousefzadeh, Amirreza; Jablonski, Miroslaw; Iakymchuk, Taras; Linares-Barranco, Alejandro; Rosado, Alfredo; Plana, Luis A; Temple, Steve; Serrano-Gotarredona, Teresa; Furber, Steve B; Linares-Barranco, Bernabe

2017-10-01

Address event representation (AER) is a widely employed asynchronous technique for interchanging "neural spikes" between different hardware elements in neuromorphic systems. Each neuron or cell in a chip or a system is assigned an address (or ID), which is typically communicated through a high-speed digital bus, thus time-multiplexing a high number of neural connections. Conventional AER links use parallel physical wires together with a pair of handshaking signals (request and acknowledge). In this paper, we present a fully serial implementation using bidirectional SATA connectors with a pair of low-voltage differential signaling (LVDS) wires for each direction. The proposed implementation can multiplex a number of conventional parallel AER links for each physical LVDS connection. It uses flow control, clock correction, and byte alignment techniques to transmit 32-bit address events reliably over multiplexed serial connections. The setup has been tested using commercial Spartan6 FPGAs attaining a maximum event transmission speed of 75 Meps (Mega events per second) for 32-bit events at a line rate of 3.0 Gbps. Full HDL codes (vhdl/verilog) and example demonstration codes for the SpiNNaker platform will be made available.
Argonne simulation framework for intelligent transportation systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ewing, T.; Doss, E.; Hanebutte, U.

1996-04-01

A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distributed (networked) computer systems; however, a version for a stand alone workstation is also available. The ITS simulator includes an Expert Driver Model (EDM) of instrumented ``smart`` vehicles with in-vehicle navigation units. The EDM is capable of performing optimal route planning and communicating with Traffic Management Centers (TMC). A dynamic road map data base is sued for optimum route planning, where the data is updated periodically tomore » reflect any changes in road or weather conditions. The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces that includes human-factors studies to support safety and operational research. Realistic modeling of variations of the posted driving speed are based on human factor studies that take into consideration weather, road conditions, driver`s personality and behavior and vehicle type. The simulator has been developed on a distributed system of networked UNIX computers, but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of the developed simulator is that vehicles will be represented by autonomous computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. Vehicle processes interact with each other and with ITS components by exchanging messages. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.« less
Distributed Memory Parallel Computing with SEAWAT

NASA Astrophysics Data System (ADS)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.
First-principles based calculation of the macroscopic α/β interface in titanium

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Dongdong; Key Lab of Nonferrous Materials of Ministry of Education, Central South University, Changsha 410083; Zhu, Lvqi

2016-06-14

The macroscopic α/β interface in titanium and titanium alloys consists of a ledge interface (112){sub β}/(01-10){sub α} and a side interface (11-1){sub β}/(2-1-10){sub α} in a zig-zag arrangement. Here, we report a first-principles study for predicting the atomic structure and the formation energy of the α/β-Ti interface. Both component interfaces were calculated using supercell models within a restrictive relaxation approach, with various staking sequences and high-symmetry parallel translations being considered. The ledge interface energy was predicted as 0.098 J/m{sup 2} and the side interface energy as 0.811 J/m{sup 2}. By projecting the zig-zag interface area onto the macroscopic broad face, the macroscopicmore » α/β interface energy was estimated to be as low as ∼0.12 J/m{sup 2}, which, however, is almost double the ad hoc value used in previous phase-field simulations.« less
Optical encryption interface

NASA Technical Reports Server (NTRS)

Jackson, Deborah J. (Inventor)

1998-01-01

An analog optical encryption system based on phase scrambling of two-dimensional optical images and holographic transformation for achieving large encryption keys and high encryption speed. An enciphering interface uses a spatial light modulator for converting a digital data stream into a two dimensional optical image. The optical image is further transformed into a hologram with a random phase distribution. The hologram is converted into digital form for transmission over a shared information channel. A respective deciphering interface at a receiver reverses the encrypting process by using a phase conjugate reconstruction of the phase scrambled hologram.
High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

NASA Astrophysics Data System (ADS)

Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

2017-01-01

With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.

Dual Super-Systolic Core for Real-Time Reconstructive Algorithms of High-Resolution Radar/SAR Imaging Systems

PubMed Central

Atoche, Alejandro Castillo; Castillo, Javier Vázquez

2012-01-01

A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
FastQuery: A Parallel Indexing System for Scientific Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chou, Jerry; Wu, Kesheng; Prabhat,

2011-07-29

Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also developmore » a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.« less
Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
A parallel input composite transimpedance amplifier.

PubMed

Kim, D J; Kim, C

2018-01-01

A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
A parallel input composite transimpedance amplifier

NASA Astrophysics Data System (ADS)

Kim, D. J.; Kim, C.

2018-01-01

A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
Surface tension dominates insect flight on fluid interfaces.

PubMed

Mukundarajan, Haripriya; Bardon, Thibaut C; Kim, Dong Hyun; Prakash, Manu

2016-03-01

Flight on the 2D air-water interface, with body weight supported by surface tension, is a unique locomotion strategy well adapted for the environmental niche on the surface of water. Although previously described in aquatic insects like stoneflies, the biomechanics of interfacial flight has never been analysed. Here, we report interfacial flight as an adapted behaviour in waterlily beetles (Galerucella nymphaeae) which are also dexterous airborne fliers. We present the first quantitative biomechanical model of interfacial flight in insects, uncovering an intricate interplay of capillary, aerodynamic and neuromuscular forces. We show that waterlily beetles use their tarsal claws to attach themselves to the interface, via a fluid contact line pinned at the claw. We investigate the kinematics of interfacial flight trajectories using high-speed imaging and construct a mathematical model describing the flight dynamics. Our results show that non-linear surface tension forces make interfacial flight energetically expensive compared with airborne flight at the relatively high speeds characteristic of waterlily beetles, and cause chaotic dynamics to arise naturally in these regimes. We identify the crucial roles of capillary-gravity wave drag and oscillatory surface tension forces which dominate interfacial flight, showing that the air-water interface presents a radically modified force landscape for flapping wing flight compared with air. © 2016. Published by The Company of Biologists Ltd.
Surface tension dominates insect flight on fluid interfaces

PubMed Central

Mukundarajan, Haripriya; Bardon, Thibaut C.; Kim, Dong Hyun; Prakash, Manu

2016-01-01

ABSTRACT Flight on the 2D air–water interface, with body weight supported by surface tension, is a unique locomotion strategy well adapted for the environmental niche on the surface of water. Although previously described in aquatic insects like stoneflies, the biomechanics of interfacial flight has never been analysed. Here, we report interfacial flight as an adapted behaviour in waterlily beetles (Galerucella nymphaeae) which are also dexterous airborne fliers. We present the first quantitative biomechanical model of interfacial flight in insects, uncovering an intricate interplay of capillary, aerodynamic and neuromuscular forces. We show that waterlily beetles use their tarsal claws to attach themselves to the interface, via a fluid contact line pinned at the claw. We investigate the kinematics of interfacial flight trajectories using high-speed imaging and construct a mathematical model describing the flight dynamics. Our results show that non-linear surface tension forces make interfacial flight energetically expensive compared with airborne flight at the relatively high speeds characteristic of waterlily beetles, and cause chaotic dynamics to arise naturally in these regimes. We identify the crucial roles of capillary–gravity wave drag and oscillatory surface tension forces which dominate interfacial flight, showing that the air–water interface presents a radically modified force landscape for flapping wing flight compared with air. PMID:26936640
Experimental and Computational Sonic Boom Assessment of Lockheed-Martin N+2 Low Boom Models

NASA Technical Reports Server (NTRS)

Cliff, Susan E.; Durston, Donald A.; Elmiligui, Alaa A.; Walker, Eric L.; Carter, Melissa B.

2015-01-01

Flight at speeds greater than the speed of sound is not permitted over land, primarily because of the noise and structural damage caused by sonic boom pressure waves of supersonic aircraft. Mitigation of sonic boom is a key focus area of the High Speed Project under NASA's Fundamental Aeronautics Program. The project is focusing on technologies to enable future civilian aircraft to fly efficiently with reduced sonic boom, engine and aircraft noise, and emissions. A major objective of the project is to improve both computational and experimental capabilities for design of low-boom, high-efficiency aircraft. NASA and industry partners are developing improved wind tunnel testing techniques and new pressure instrumentation to measure the weak sonic boom pressure signatures of modern vehicle concepts. In parallel, computational methods are being developed to provide rapid design and analysis of supersonic aircraft with improved meshing techniques that provide efficient, robust, and accurate on- and off-body pressures at several body lengths from vehicles with very low sonic boom overpressures. The maturity of these critical parallel efforts is necessary before low-boom flight can be demonstrated and commercial supersonic flight can be realized.
Development of a VR-based Treadmill Control Interface for Gait Assessment of Patients with Parkinson’s Disease

PubMed Central

Park, Hyung-Soon; Yoon, Jung Won; Kim, Jonghyun; Iseki, Kazumi; Hallett, Mark

2013-01-01

Freezing of gait (FOG) is a commonly observed phenomenon in Parkinson’s disease, but its causes and mechanisms are not fully understood. This paper presents the development of a virtual reality (VR)-based body-weight supported treadmill interface (BWSTI) designed and applied to investigate FOG. The BWSTI provides a safe and controlled walking platform which allows investigators to assess gait impairments under various conditions that simulate real life. In order to be able to evoke FOG, our BWSTI employed a novel speed adaptation controller, which allows patients to drive the treadmill speed. Our interface responsively follows the subject’s intention of changing walking speed by the combined use of feedback and feedforward controllers. To provide realistic visual stimuli, a three dimensional VR system is interfaced with the speed adaptation controller and synchronously displays realistic visual cues. The VR-based BWSTI was tested with three patients with PD who are known to have FOG. Visual stimuli that might cause FOG were shown to them while the speed adaptation controller adjusted treadmill speed to follow the subjects’ intention. Two of the three subjects showed FOG during the treadmill walking. PMID:22275661
A High-Level Symbolic Representation for Intelligent Agents Across Multiple Architectures

DTIC Science & Technology

2004-07-01

components of Soar that map to these concepts (instantiation support, selected operator). Fik Ed" Vie Go Boolbmo .’ lookb Wind , Help 1B w ,’ F:ld 1.ý fie...AnswerSpeedRequest ((msg> isa RequestSpeedChange consider (sel’>. pmsg (msg> end 0 St=ndadd irttezf•cc fo1.1 goals . ~interface lGoal s l’n sa,,invq this goail Ys "rt
Adaptive optics parallel spectral domain optical coherence tomography for imaging the living retina

NASA Astrophysics Data System (ADS)

Zhang, Yan; Rha, Jungtae; Jonnal, Ravi S.; Miller, Donald T.

2005-06-01

Although optical coherence tomography (OCT) can axially resolve and detect reflections from individual cells, there are no reports of imaging cells in the living human retina using OCT. To supplement the axial resolution and sensitivity of OCT with the necessary lateral resolution and speed, we developed a novel spectral domain OCT (SD-OCT) camera based on a free-space parallel illumination architecture and equipped with adaptive optics (AO). Conventional flood illumination, also with AO, was integrated into the camera and provided confirmation of the focus position in the retina with an accuracy of ±10.3 μm. Short bursts of narrow B-scans (100x560 μm) of the living retina were subsequently acquired at 500 Hz during dynamic compensation (up to 14 Hz) that successfully corrected the most significant ocular aberrations across a dilated 6 mm pupil. Camera sensitivity (up to 94 dB) was sufficient for observing reflections from essentially all neural layers of the retina. Signal-to-noise of the detected reflection from the photoreceptor layer was highly sensitive to the level of cular aberrations and defocus with changes of 11.4 and 13.1 dB (single pass) observed when the ocular aberrations (astigmatism, 3rd order and higher) were corrected and when the focus was shifted by 200 μm (0.54 diopters) in the retina, respectively. The 3D resolution of the B-scans (3.0x3.0x5.7 μm) is the highest reported to date in the living human eye and was sufficient to observe the interface between the inner and outer segments of individual photoreceptor cells, resolved in both lateral and axial dimensions. However, high contrast speckle, which is intrinsic to OCT, was present throughout the AO parallel SD-OCT B-scans and obstructed correlating retinal reflections to cell-sized retinal structures.
Directly measuring of thermal pulse transfer in one-dimensional highly aligned carbon nanotubes.

PubMed

Zhang, Guang; Liu, Changhong; Fan, Shoushan

2013-01-01

Using a simple and precise instrument system, we directly measured the thermo-physical properties of one-dimensional highly aligned carbon nanotubes (CNTs). A kind of CNT-based macroscopic materials named super aligned carbon nanotube (SACNT) buckypapers was measured in our experiment. We defined a new one-dimensional parameter, the "thermal transfer speed" to characterize the thermal damping mechanisms in the SACNT buckypapers. Our results indicated that the SACNT buckypapers with different densities have obviously different thermal transfer speeds. Furthermore, we found that the thermal transfer speed of high-density SACNT buckypapers may have an obvious damping factor along the CNTs aligned direction. The anisotropic thermal diffusivities of SACNT buckypapers could be calculated by the thermal transfer speeds. The thermal diffusivities obviously increase as the buckypaper-density increases. For parallel SACNT buckypapers, the thermal diffusivity could be as high as 562.2 ± 55.4 mm(2)/s. The thermal conductivities of these SACNT buckypapers were also calculated by the equation k = Cpαρ.
The R package "sperrorest" : Parallelized spatial error estimation and variable importance assessment for geospatial machine learning

NASA Astrophysics Data System (ADS)

Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander

2017-04-01

Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.
Compact holographic optical neural network system for real-time pattern recognition

NASA Astrophysics Data System (ADS)

Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.

1996-08-01

One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
Parallel processing of genomics data

NASA Astrophysics Data System (ADS)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Performance evaluation of canny edge detection on a tiled multicore architecture

NASA Astrophysics Data System (ADS)

Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

2011-01-01

In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Wavelet-space correlation imaging for high-speed MRI without motion monitoring or data segmentation.

PubMed

Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles

2015-12-01

This study aims to (i) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and (ii) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called "wavelet-space correlation imaging", is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI, and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. © 2014 Wiley Periodicals, Inc.
Distributed Large Data-Object Environments: End-to-End Performance Analysis of High Speed Distributed Storage Systems in Wide Area ATM Networks

NASA Technical Reports Server (NTRS)

Johnston, William; Tierney, Brian; Lee, Jason; Hoo, Gary; Thompson, Mary

1996-01-01

We have developed and deployed a distributed-parallel storage system (DPSS) in several high speed asynchronous transfer mode (ATM) wide area networks (WAN) testbeds to support several different types of data-intensive applications. Architecturally, the DPSS is a network striped disk array, but is fairly unique in that its implementation allows applications complete freedom to determine optimal data layout, replication and/or coding redundancy strategy, security policy, and dynamic reconfiguration. In conjunction with the DPSS, we have developed a 'top-to-bottom, end-to-end' performance monitoring and analysis methodology that has allowed us to characterize all aspects of the DPSS operating in high speed ATM networks. In particular, we have run a variety of performance monitoring experiments involving the DPSS in the MAGIC testbed, which is a large scale, high speed, ATM network and we describe our experience using the monitoring methodology to identify and correct problems that limit the performance of high speed distributed applications. Finally, the DPSS is part of an overall architecture for using high speed, WAN's for enabling the routine, location independent use of large data-objects. Since this is part of the motivation for a distributed storage system, we describe this architecture.
Wavelet-space Correlation Imaging for High-speed MRI without Motion Monitoring or Data Segmentation

PubMed Central

Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles

2014-01-01

Purpose This study aims to 1) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and 2) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Methods Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called “wavelet-space correlation imaging”, is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Results Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Conclusion Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. PMID:25470230
a Real-Time Computer Music Synthesis System

NASA Astrophysics Data System (ADS)

Lent, Keith Henry

A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.

SIMS: addressing the problem of heterogeneity in databases

NASA Astrophysics Data System (ADS)

Arens, Yigal

1997-02-01

The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.
A Component-Based FPGA Design Framework for Neuronal Ion Channel Dynamics Simulations

PubMed Central

Mak, Terrence S. T.; Rachmuth, Guy; Lam, Kai-Pui; Poon, Chi-Sang

2008-01-01

Neuron-machine interfaces such as dynamic clamp and brain-implantable neuroprosthetic devices require real-time simulations of neuronal ion channel dynamics. Field Programmable Gate Array (FPGA) has emerged as a high-speed digital platform ideal for such application-specific computations. We propose an efficient and flexible component-based FPGA design framework for neuronal ion channel dynamics simulations, which overcomes certain limitations of the recently proposed memory-based approach. A parallel processing strategy is used to minimize computational delay, and a hardware-efficient factoring approach for calculating exponential and division functions in neuronal ion channel models is used to conserve resource consumption. Performances of the various FPGA design approaches are compared theoretically and experimentally in corresponding implementations of the AMPA and NMDA synaptic ion channel models. Our results suggest that the component-based design framework provides a more memory economic solution as well as more efficient logic utilization for large word lengths, whereas the memory-based approach may be suitable for time-critical applications where a higher throughput rate is desired. PMID:17190033
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Low-power, transparent optical network interface for high bandwidth off-chip interconnects.

PubMed

Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren

2009-04-13

The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.
Visualization of hump formation in high-speed gas metal arc welding

NASA Astrophysics Data System (ADS)

Wu, C. S.; Zhong, L. M.; Gao, J. Q.

2009-11-01

The hump bead is a typical weld defect observed in high-speed welding. Its occurrence limits the improvement of welding productivity. Visualization of hump formation during high-speed gas metal arc welding (GMAW) is helpful in the better understanding of the humping phenomena so that effective measures can be taken to suppress or decrease the tendency of hump formation and achieve higher productivity welding. In this study, an experimental system was developed to implement vision-based observation of the weld pool behavior during high-speed GMAW. Considering the weld pool characteristics in high-speed welding, a narrow band-pass and neutral density filter was equipped for the CCD camera, the suitable exposure time was selected and side view orientation of the CCD camera was employed. The events that took place at the rear portion of the weld pools were imaged during the welding processes with and without hump bead formation, respectively. It was found that the variation of the weld pool surface height and the solid-liquid interface at the pool trailing with time shows some useful information to judge whether the humping phenomenon occurs or not.
Efficient abstract data type components for distributed and parallel systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bastani, F.; Hilal, W.; Iyengar, S.S.

1987-10-01

One way of improving software system's comprehensibility and maintainability is to decompose it into several components, each of which encapsulates some information concerning the system. These components can be classified into four categories, namely, abstract data type, functional, interface, and control components. Such a classfication underscores the need for different specification, implementation, and performance-improvement methods for different types of components. This article focuses on the development of high-performance abstract data type components for distributed and parallel environments.
Pulsed particle beam vacuum-to-air interface

DOEpatents

Cruz, Gilbert E.; Edwards, William F.

1988-01-01

A vacuum-to-air interface (10) is provided for a high-powered, pulsed particle beam accelerator. The interface comprises a pneumatic high speed gate valve (18), from which extends a vacuum-tight duct (26), that termintes in an aperture (28). Means (32, 34, 36, 38, 40, 42, 44, 46, 48) are provided for periodically advancing a foil strip (30) across the aperture (28) at the repetition rate of the particle pulses. A pneumatically operated hollow sealing band (62) urges foil strip (30), when stationary, against and into the aperture (28). Gas pressure means (68, 70) periodically lift off and separate foil strip (30) from aperture (28), so that it may be readily advanced.
Image sensor with high dynamic range linear output

NASA Technical Reports Server (NTRS)

Yadid-Pecht, Orly (Inventor); Fossum, Eric R. (Inventor)

2007-01-01

Designs and operational methods to increase the dynamic range of image sensors and APS devices in particular by achieving more than one integration times for each pixel thereof. An APS system with more than one column-parallel signal chains for readout are described for maintaining a high frame rate in readout. Each active pixel is sampled for multiple times during a single frame readout, thus resulting in multiple integration times. The operation methods can also be used to obtain multiple integration times for each pixel with an APS design having a single column-parallel signal chain for readout. Furthermore, analog-to-digital conversion of high speed and high resolution can be implemented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Song

CFD (Computational Fluid Dynamics) is a widely used technique in engineering design field. It uses mathematical methods to simulate and predict flow characteristics in a certain physical space. Since the numerical result of CFD computation is very hard to understand, VR (virtual reality) and data visualization techniques are introduced into CFD post-processing to improve the understandability and functionality of CFD computation. In many cases CFD datasets are very large (multi-gigabytes), and more and more interactions between user and the datasets are required. For the traditional VR application, the limitation of computing power is a major factor to prevent visualizing largemore » dataset effectively. This thesis presents a new system designing to speed up the traditional VR application by using parallel computing and distributed computing, and the idea of using hand held device to enhance the interaction between a user and VR CFD application as well. Techniques in different research areas including scientific visualization, parallel computing, distributed computing and graphical user interface designing are used in the development of the final system. As the result, the new system can flexibly be built on heterogeneous computing environment, dramatically shorten the computation time.« less
28-Bit serial word simulator/monitor

NASA Technical Reports Server (NTRS)

Durbin, J. W.

1979-01-01

Modular interface unit transfers data at high speeds along four channels. Device expedites variable-word-length communication between computers. Operation eases exchange of bit information by automatically reformatting coded input data and status information to match requirements of output.
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

PubMed

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
The growth and breakdown of a vortex-pair in a stably stratified fluid

NASA Astrophysics Data System (ADS)

Advaith, S.; Tinaikar, Aashay; Manu, K. V.; Basu, Saptarshi

2017-11-01

Vortex interaction with density stratification is ubiquitous in nature and applied to various engineering applications. Present study have characterized the spatial and temporal dynamics of the interaction between a vortex and a density stratified interface. The present work is prompted by our research on single tank Thermal Energy Storage (TES) system used in concentrated solar power (CSP) plants where hot and cold fluids are separated by means of density stratification. Rigorous qualitative (High speed Shadowgraph) and quantitative (high speed PIV) studies enable us to have great understanding about vortex formation, propagation, interaction dynamics with density stratified interface, resulted plume characteristics and so on. We have categorized this interaction phenomena in to three different cases based on its nature as non-penetrative, partial penetrative and extensively penetrative. Along with that we have proposed a regime map consisting non-dimensional parameters like Reynolds, Richardson and Atwood numbers which predicts the occurrence above mentioned cases.
A high-speed BCI based on code modulation VEP

NASA Astrophysics Data System (ADS)

Bin, Guangyu; Gao, Xiaorong; Wang, Yijun; Li, Yun; Hong, Bo; Gao, Shangkai

2011-04-01

Recently, electroencephalogram-based brain-computer interfaces (BCIs) have attracted much attention in the fields of neural engineering and rehabilitation due to their noninvasiveness. However, the low communication speed of current BCI systems greatly limits their practical application. In this paper, we present a high-speed BCI based on code modulation of visual evoked potentials (c-VEP). Thirty-two target stimuli were modulated by a time-shifted binary pseudorandom sequence. A multichannel identification method based on canonical correlation analysis (CCA) was used for target identification. The online system achieved an average information transfer rate (ITR) of 108 ± 12 bits min-1 on five subjects with a maximum ITR of 123 bits min-1 for a single subject.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

PubMed

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
Sidewall crystallization and saturation front formation in silicic magma chambers

NASA Astrophysics Data System (ADS)

Lake, E. T.

2012-12-01

The cooling and crystallization style of silicic magma bodies in the upper crust falls on a continuum between whole-chamber processes of convection, crystal settling, and cumulate formation and interface driven processes of conduction and crystallization front migration. In the former case, volatile saturation occurs uniformly chamber wide, in the latter volatile saturation occurs along an inward propagating front. Ambient thermal gradient primarily controls the propagation rate; warm (> 30 °C / km) geothermal gradients promote 1000m+ thick crystal mush zones but slow crystallization front propagation. Cold geothermal gradients support the opposite. Magma chamber geometry plays a second order role in controlling propagation rates; bodies with high surface to magma ratio and large Earth's surface parallel faces exhibit more rapid propagation and smaller mush zones. Crystallization front propagation occurs at speeds of up to 6 cm/year (rhyolitic magma, thin sill geometry, 10 °C / km geotherm), far faster than diffusion of volatiles in magma and faster than bubbles can nucleate and ascend under certain conditions. Saturation front propagation is fixed by pressure and magma crystal content; above certain modest initial water contents (4.4 wt% in a dacite) mobile magma above 10 km depth always contains a saturation front. Saturation fronts propagate down from the magma chamber roof at lower water contents (3.3 wt% in a dacite at 5 km depth), creating an upper saturated interface for most common (4 - 6 wt%) magma water contents. This upper interface promotes the production of a fluid pocket underneath the apex of the magma chamber. Magma de-densification by bubble nucleation promotes convection and homogenization in dacitic systems. If the fluid pocket grew rapidly without draining, hydro-fracturing and eruption would result. The combination of fluid escape pathways and metal scavenging would generate economic vein or porphyry deposits.
Running accuracy analysis of a 3-RRR parallel kinematic machine considering the deformations of the links

NASA Astrophysics Data System (ADS)

Wang, Liping; Jiang, Yao; Li, Tiemin

2014-09-01

Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
Note: High-speed Z tip scanner with screw cantilever holding mechanism for atomic-resolution atomic force microscopy in liquid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reza Akrami, Seyed Mohammad; Miyata, Kazuki; Asakawa, Hitoshi

High-speed atomic force microscopy has attracted much attention due to its unique capability of visualizing nanoscale dynamic processes at a solid/liquid interface. However, its usability and resolution have yet to be improved. As one of the solutions for this issue, here we present a design of a high-speed Z-tip scanner with screw holding mechanism. We perform detailed comparison between designs with different actuator size and screw arrangement by finite element analysis. Based on the design giving the best performance, we have developed a Z tip scanner and measured its performance. The measured frequency response of the scanner shows a flatmore » response up to ∼10 kHz. This high frequency response allows us to achieve wideband tip-sample distance regulation. We demonstrate the applicability of the scanner to high-speed atomic-resolution imaging by visualizing atomic-scale calcite crystal dissolution process in water at 2 s/frame.« less
Method for transition prediction in high-speed boundary layers, phase 2

NASA Astrophysics Data System (ADS)

Herbert, T.; Stuckert, G. K.; Lin, N.

1993-09-01

The parabolized stability equations (PSE) are a new and more reliable approach to analyzing the stability of streamwise varying flows such as boundary layers. This approach has been previously validated for idealized incompressible flows. Here, the PSE are formulated for highly compressible flows in general curvilinear coordinates to permit the analysis of high-speed boundary-layer flows over fairly general bodies. Vigorous numerical studies are carried out to study convergence and accuracy of the linear-stability code LSH and the linear/nonlinear PSE code PSH. Physical interfaces are set up to analyze the M = 8 boundary layer over a blunt cone calculated by using a thin-layer Navier Stokes (TNLS) code and the flow over a sharp cone at angle of attack calculated using the AFWAL parabolized Navier-Stokes (PNS) code. While stability and transition studies at high speeds are far from routine, the method developed here is the best tool available to research the physical processes in high-speed boundary layers.
Sb7Te3/Ge multilayer films for low power and high speed phase-change memory

NASA Astrophysics Data System (ADS)

Chen, Shiyu; Wu, Weihua; Zhai, Jiwei; Song, Sannian; Song, Zhitang

2017-06-01

Phase-change memory has attracted enormous attention for its excellent properties as compared to flash memories due to their high speed, high density, better date retention and low power consumption. Here we present Sb7Te3/Ge multilayer films by using a magnetron sputtering method. The 10 years’ data retention temperature is significantly increased compared with pure Sb7Te3. When the annealing temperature is above 250 °C, the Sb7Te3/Ge multilayer thin films have better interface properties, which renders faster crystallization speed and high thermal stability. The decrease in density of ST/Ge multilayer films is only around 5%, which is very suitable for phase change materials. Moreover, the low RESET power benefits from high resistivity and better thermal stability in the PCM cells. This work demonstrates that the multilayer configuration thin films with tailored properties are beneficial for improving the stability and speed in phase change memory applications.

Perils of using speed zone data to assess real-world compliance to speed limits.

PubMed

Chevalier, Anna; Clarke, Elizabeth; Chevalier, Aran John; Brown, Julie; Coxon, Kristy; Ivers, Rebecca; Keay, Lisa

2017-11-17

Real-world driving studies, including those involving speeding alert devices and autonomous vehicles, can gauge an individual vehicle's speeding behavior by comparing measured speed with mapped speed zone data. However, there are complexities with developing and maintaining a database of mapped speed zones over a large geographic area that may lead to inaccuracies within the data set. When this approach is applied to large-scale real-world driving data or speeding alert device data to determine speeding behavior, these inaccuracies may result in invalid identification of speeding. We investigated speeding events based on service provider speed zone data. We compared service provider speed zone data (Speed Alert by Smart Car Technologies Pty Ltd., Ultimo, NSW, Australia) against a second set of speed zone data (Google Maps Application Programming Interface [API] mapped speed zones). We found a systematic error in the zones where speed limits of 50-60 km/h, typical of local roads, were allocated to high-speed motorways, which produced false speed limits in the speed zone database. The result was detection of false-positive high-range speeding. Through comparison of the service provider speed zone data against a second set of speed zone data, we were able to identify and eliminate data most affected by this systematic error, thereby establishing a data set of speeding events with a high level of sensitivity (a true positive rate of 92% or 6,412/6,960). Mapped speed zones can be a source of error in real-world driving when examining vehicle speed. We explored the types of inaccuracies found within speed zone data and recommend that a second set of speed zone data be utilized when investigating speeding behavior or developing mapped speed zone data to minimize inaccuracy in estimates of speeding.
Runtime support for parallelizing data mining algorithms

NASA Astrophysics Data System (ADS)

Jin, Ruoming; Agrawal, Gagan

2002-03-01

With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Microwave interferometry technique for obtaining gas interface velocity measurements in an expansion tube facility

NASA Technical Reports Server (NTRS)

Laney, C. C., Jr.

1974-01-01

A microwave interferometer technique to determine the front interface velocity of a high enthalpy gas flow, is described. The system is designed to excite a standing wave in an expansion tube, and to measure the shift in this standing wave as it is moved by the test gas front. Data, in the form of a varying sinusoidal signal, is recorded on a high-speed drum camera-oscilloscope combination. Measurements of average and incremental velocities in excess of 6,000 meters per second were made.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
DMA shared byte counters in a parallel computer

DOEpatents

Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

2010-04-06

A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
The Air-Sea Interface and Surface Stress under Tropical Cyclones

NASA Astrophysics Data System (ADS)

Soloviev, Alexander; Lukas, Roger; Donelan, Mark; Ginis, Isaac

2013-04-01

Air-sea interaction dramatically changes from moderate to very high wind speed conditions (Donelan et al. 2004). Unresolved physics of the air-sea interface are one of the weakest components in tropical cyclone prediction models. Rapid disruption of the air-water interface under very high wind speed conditions was reported in laboratory experiments (Koga 1981) and numerical simulations (Soloviev et al. 2012), which resembled the Kelvin-Helmholtz instability at an interface with very large density difference. Kelly (1965) demonstrated that the KH instability at the air-sea interface can develop through parametric amplification of waves. Farrell and Ioannou (2008) showed that gustiness results in the parametric KH instability of the air-sea interface, while the gusts are due to interacting waves and turbulence. The stochastic forcing enters multiplicatively in this theory and produces an exponential wave growth, augmenting the growth from the Miles (1959) theory as the turbulence level increases. Here we complement this concept by adding the effect of the two-phase environment near the mean interface, which introduces additional viscosity in the system (turning it into a rheological system). The two-phase environment includes air-bubbles and re-entering spray (spume), which eliminates a portion of the wind-wave wavenumber spectrum that is responsible for a substantial part of the air sea drag coefficient. The previously developed KH-type interfacial parameterization (Soloviev and Lukas 2010) is unified with two versions of the wave growth model. The unified parameterization in both cases exhibits the increase of the drag coefficient with wind speed until approximately 30 m/s. Above this wind speed threshold, the drag coefficient either nearly levels off or even slightly drops (for the wave growth model that accounts for the shear) and then starts again increasing above approximately 65 m/s wind speed. Remarkably, the unified parameterization reveals a local minimum of the drag coefficient wind speed dependence around 65 m/s. This minimum may contribute to the rapid intensification of storms to major tropical cyclones. The subsequent slow increase of the drag coefficient with wind above 65 m/s serves as an obstacle for further intensification of tropical cyclones. Such dependence may explain the observed bi-modal distribution of tropical cyclone intensity. Implementation of the new parameterization into operational models is expected to improve predictions of tropical cyclone intensity and the associated wave field. References: Donelan, M. A., B. K. Haus, N. Reul, W. Plant, M. Stiassnie, H. Graber, O. Brown, and E. Saltzman, 2004: On the limiting aerodynamic roughness of the ocean in very strong winds, Farrell, B.F, and P.J. Ioannou, 2008: The stochastic parametric mechanism for growth of wind-driven surface water waves. Journal of Physical Oceanography 38, 862-879. Kelly, R.E., 1965: The stability of an unsteady Kelvin-Helmholtz flow. J. Fluid Mech. 22, 547-560. Koga, M., 1981: Direct production of droplets from breaking wind-waves-Its observation by a multi-colored overlapping exposure technique, Tellus 33, 552-563. Miles, J.W., 1959: On the generation of surface waves by shear flows, part 3. J. Fluid. Mech. 6, 583-598. Soloviev, A.V. and R. Lukas, 2010: Effects of bubbles and sea spray on air-sea exchanges in hurricane conditions. Boundary-Layer Meteorology 136, 365-376. Soloviev, A., A. Fujimura, and S. Matt, 2012: Air-sea interface in hurricane conditions. J. Geophys. Res. 117, C00J34.
Structural Insights into the Quadruplex-Duplex 3' Interface Formed from a Telomeric Repeat: A Potential Molecular Target.

PubMed

Russo Krauss, Irene; Ramaswamy, Sneha; Neidle, Stephen; Haider, Shozeb; Parkinson, Gary N

2016-02-03

We report here on an X-ray crystallographic and molecular modeling investigation into the complex 3' interface formed between putative parallel stranded G-quadruplexes and a duplex DNA sequence constructed from the human telomeric repeat sequence TTAGGG. Our crystallographic approach provides a detailed snapshot of a telomeric 3' quadruplex-duplex junction: a junction that appears to have the potential to form a unique molecular target for small molecule binding and interference with telomere-related functions. This unique target is particularly relevant as current high-affinity compounds that bind putative G-quadruplex forming sequences only rarely have a high degree of selectivity for a particular quadruplex. Here DNA junctions were assembled using different putative quadruplex-forming scaffolds linked at the 3' end to a telomeric duplex sequence and annealed to a complementary strand. We successfully generated a series of G-quadruplex-duplex containing crystals, both alone and in the presence of ligands. The structures demonstrate the formation of a parallel folded G-quadruplex and a B-form duplex DNA stacked coaxially. Most strikingly, structural data reveals the consistent formation of a TAT triad platform between the two motifs. This triad allows for a continuous stack of bases to link the quadruplex motif with the duplex region. For these crystal structures formed in the absence of ligands, the TAT triad interface occludes ligand binding at the 3' quadruplex-duplex interface, in agreement with in silico docking predictions. However, with the rearrangement of a single nucleotide, a stable pocket can be produced, thus providing an opportunity for the binding of selective molecules at the interface.
Fiber optic interferometry for industrial process monitoring and control applications

NASA Astrophysics Data System (ADS)

Marcus, Michael A.

2002-02-01

Over the past few years we have been developing applications for a high-resolution (sub-micron accuracy) fiber optic coupled dual Michelson interferometer-based instrument. It is being utilized in a variety of applications including monitoring liquid layer thickness uniformity on coating hoppers, film base thickness uniformity measurement, digital camera focus assessment, optical cell path length assessment and imager and wafer surface profile mapping. The instrument includes both coherent and non-coherent light sources, custom application dependent optical probes and sample interfaces, a Michelson interferometer, custom electronics, a Pentium-based PC with data acquisition cards and LabWindows CVI or LabView based application specific software. This paper describes the development evolution of this instrument platform and applications highlighting robust instrument design, hardware, software, and user interfaces development. The talk concludes with a discussion of a new high-speed instrument configuration, which can be utilized for high speed surface profiling and as an on-line web thickness gauge.
A functional video-based anthropometric measuring system

NASA Technical Reports Server (NTRS)

Nixon, J. H.; Cater, J. P.

1982-01-01

A high-speed anthropometric three dimensional measurement system using the Selcom Selspot motion tracking instrument for visual data acquisition is discussed. A three-dimensional scanning system was created which collects video, audio, and performance data on a single standard video cassette recorder. Recording rates of 1 megabit per second for periods of up to two hours are possible with the system design. A high-speed off-the-shelf motion analysis system for collecting optical information as used. The video recording adapter (VRA) is interfaced to the Selspot data acquisition system.
Nearly Interactive Parabolized Navier-Stokes Solver for High Speed Forebody and Inlet Flows

NASA Technical Reports Server (NTRS)

Benson, Thomas J.; Liou, May-Fun; Jones, William H.; Trefny, Charles J.

2009-01-01

A system of computer programs is being developed for the preliminary design of high speed inlets and forebodies. The system comprises four functions: geometry definition, flow grid generation, flow solver, and graphics post-processor. The system runs on a dedicated personal computer using the Windows operating system and is controlled by graphical user interfaces written in MATLAB (The Mathworks, Inc.). The flow solver uses the Parabolized Navier-Stokes equations to compute millions of mesh points in several minutes. Sample two-dimensional and three-dimensional calculations are demonstrated in the paper.
High speed fiber optics local area networks: Design and implementation

NASA Technical Reports Server (NTRS)

Tobagi, Fouad A.

1988-01-01

The design of high speed local area networks (HSLAN) for communication among distributed devices requires solving problems in three areas: (1) the network medium and its topology; (2) the medium access control; and (3) the network interface. Considerable progress has been made in all areas. Accomplishments are divided into two groups according to their theoretical or experimental nature. A brief summary is given in Section 2, including references to papers which appeared in the literature, as well as to Ph.D. dissertations and technical reports published at Stanford University.
Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation.

DTIC Science & Technology

1986-03-01

the proposed approaches 16, 16, 40 . 451. The conclusion most often reached is that the best scheme to use in a particular design depends highly upon...76. 40 . Siegel, H. J., McMillen. R. J., and Mueller. P. T.. Jr. A survey of interconnection methods for reconligurable parallel processing systems...addressing meehaanm distributed in the network area rimonication% tit reach gigabit./second speeds je g.. PoCoS83 .’ i.V--i the lirO! lk i nitronment is
[Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

PubMed

Yokohama, Noriya

2013-07-01

This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, Wallace B.; DuBois, David H.

1998-08-11

A digital system provides sending and receiving gateways for HIPPI interfaces. Electronic logic circuitry formats data signals and overhead signals in a data frame that is suitable for transmission over a connecting fiber optic link. Multiplexers route the data and overhead signals to a framer module. The framer module allocates the data and overhead signals to a plurality of 9-byte words that are arranged in a selected protocol. The formatted words are stored in a storage register for output through the gateway.
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching Yuen

1991-01-01

A new Lagrangian formulation of the Euler equation is adopted for the calculation of 2-D supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, a better than six times speed-up was achieved on a 8192-processor CM-2 over a single processor of a CRAY-2.
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching-Yuen

1992-01-01

This paper adopts a new Lagrangian formulation of the Euler equation for the calculation of two dimensional supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, we have achieved better than six times speed-up on a 8192-processor CM-2 over a single processor of a CRAY-2.
Next Generation Space Telescope Integrated Science Module Data System

NASA Technical Reports Server (NTRS)

Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.

1999-01-01

The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

NASA Astrophysics Data System (ADS)

Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

2011-07-01

An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.

Implementation of Interaction Algorithm to Non-Matching Discrete Interfaces Between Structure and Fluid Mesh

NASA Technical Reports Server (NTRS)

Chen, Shu-Po

1999-01-01

This paper presents software for solving the non-conforming fluid structure interfaces in aeroelastic simulation. It reviews the algorithm of interpolation and integration, highlights the flexibility and the user-friendly feature that allows the user to select the existing structure and fluid package, like NASTRAN and CLF3D, to perform the simulation. The presented software is validated by computing the High Speed Civil Transport model.
Magnetospheric Multiscale Observations of the Electron Diffusion Region of Large Guide Field Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Eriksson, S.; Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Cassak, P. A.; Burch, J. L.; Chen, Li-Jen; Torbert, R. B.; Phan, T. D.; Lavraud, B.;

2016-01-01

We report observations from the Magnetospheric Multiscale (MMS) satellites of a large guide field magnetic reconnection event. The observations suggest that two of the four MMS spacecraft sampled the electron diffusion region, whereas the other two spacecraft detected the exhaust jet from the event. The guide magnetic field amplitude is approximately 4 times that of the reconnecting field. The event is accompanied by a significant parallel electric field (E(sub parallel lines) that is larger than predicted by simulations. The high-speed (approximately 300 km/s) crossing of the electron diffusion region limited the data set to one complete electron distribution inside of the electron diffusion region, which shows significant parallel heating. The data suggest that E(sub parallel lines) is balanced by a combination of electron inertia and a parallel gradient of the gyrotropic electron pressure.

Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Component Technology for High-Performance Scientific Simulation Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Epperly, T; Kohn, S; Kumfert, G

2000-11-09

We are developing scientific software component technology to manage the complexity of modem, parallel simulation software and increase the interoperability and re-use of scientific software packages. In this paper, we describe a language interoperability tool named Babel that enables the creation and distribution of language-independent software libraries using interface definition language (IDL) techniques. We have created a scientific IDL that focuses on the unique interface description needs of scientific codes, such as complex numbers, dense multidimensional arrays, complicated data types, and parallelism. Preliminary results indicate that in addition to language interoperability, this approach provides useful tools for thinking about themore » design of modem object-oriented scientific software libraries. Finally, we also describe a web-based component repository called Alexandria that facilitates the distribution, documentation, and re-use of scientific components and libraries.« less
Continuous measurement of air-water gas exchange by underwater eddy covariance

NASA Astrophysics Data System (ADS)

Berg, Peter; Pace, Michael L.

2017-12-01

Exchange of gases, such as O2, CO2, and CH4, over the air-water interface is an important component in aquatic ecosystem studies, but exchange rates are typically measured or estimated with substantial uncertainties. This diminishes the precision of common ecosystem assessments associated with gas exchanges such as primary production, respiration, and greenhouse gas emission. Here, we used the aquatic eddy covariance technique - originally developed for benthic O2 flux measurements - right below the air-water interface (˜ 4 cm) to determine gas exchange rates and coefficients. Using an acoustic Doppler velocimeter and a fast-responding dual O2-temperature sensor mounted on a floating platform the 3-D water velocity, O2 concentration, and temperature were measured at high-speed (64 Hz). By combining these data, concurrent vertical fluxes of O2 and heat across the air-water interface were derived, and gas exchange coefficients were calculated from the former. Proof-of-concept deployments at different river sites gave standard gas exchange coefficients (k600) in the range of published values. A 40 h long deployment revealed a distinct diurnal pattern in air-water exchange of O2 that was controlled largely by physical processes (e.g., diurnal variations in air temperature and associated air-water heat fluxes) and not by biological activity (primary production and respiration). This physical control of gas exchange can be prevalent in lotic systems and adds uncertainty to assessments of biological activity that are based on measured water column O2 concentration changes. For example, in the 40 h deployment, there was near-constant river flow and insignificant winds - two main drivers of lotic gas exchange - but we found gas exchange coefficients that varied by several fold. This was presumably caused by the formation and erosion of vertical temperature-density gradients in the surface water driven by the heat flux into or out of the river that affected the turbulent mixing. This effect is unaccounted for in widely used empirical correlations for gas exchange coefficients and is another source of uncertainty in gas exchange estimates. The aquatic eddy covariance technique allows studies of air-water gas exchange processes and their controls at an unparalleled level of detail. A finding related to the new approach is that heat fluxes at the air-water interface can, contrary to those typically found in the benthic environment, be substantial and require correction of O2 sensor readings using high-speed parallel temperature measurements. Fast-responding O2 sensors are inherently sensitive to temperature changes, and if this correction is omitted, temperature fluctuations associated with the turbulent heat flux will mistakenly be recorded as O2 fluctuations and bias the O2 eddy flux calculation.
Remote gaze tracking system on a large display.

PubMed

Lee, Hyeon Chang; Lee, Won Oh; Cho, Chul Woo; Gwon, Su Yeong; Park, Kang Ryoung; Lee, Heekyung; Cha, Jihun

2013-10-07

We propose a new remote gaze tracking system as an intelligent TV interface. Our research is novel in the following three ways: first, because a user can sit at various positions in front of a large display, the capture volume of the gaze tracking system should be greater, so the proposed system includes two cameras which can be moved simultaneously by panning and tilting mechanisms, a wide view camera (WVC) for detecting eye position and an auto-focusing narrow view camera (NVC) for capturing enlarged eye images. Second, in order to remove the complicated calibration between the WVC and NVC and to enhance the capture speed of the NVC, these two cameras are combined in a parallel structure. Third, the auto-focusing of the NVC is achieved on the basis of both the user's facial width in the WVC image and a focus score calculated on the eye image of the NVC. Experimental results showed that the proposed system can be operated with a gaze tracking accuracy of ±0.737°~±0.775° and a speed of 5~10 frames/s.
Remote Gaze Tracking System on a Large Display

PubMed Central

Lee, Hyeon Chang; Lee, Won Oh; Cho, Chul Woo; Gwon, Su Yeong; Park, Kang Ryoung; Lee, Heekyung; Cha, Jihun

2013-01-01

We propose a new remote gaze tracking system as an intelligent TV interface. Our research is novel in the following three ways: first, because a user can sit at various positions in front of a large display, the capture volume of the gaze tracking system should be greater, so the proposed system includes two cameras which can be moved simultaneously by panning and tilting mechanisms, a wide view camera (WVC) for detecting eye position and an auto-focusing narrow view camera (NVC) for capturing enlarged eye images. Second, in order to remove the complicated calibration between the WVC and NVC and to enhance the capture speed of the NVC, these two cameras are combined in a parallel structure. Third, the auto-focusing of the NVC is achieved on the basis of both the user's facial width in the WVC image and a focus score calculated on the eye image of the NVC. Experimental results showed that the proposed system can be operated with a gaze tracking accuracy of ±0.737°∼±0.775° and a speed of 5∼10 frames/s. PMID:24105351
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
High-volume production of single and compound emulsions in a microfluidic parallelization arrangement coupled with coaxial annular world-to-chip interfaces.

PubMed

Nisisako, Takasi; Ando, Takuya; Hatsuzawa, Takeshi

2012-09-21

This study describes a microfluidic platform with coaxial annular world-to-chip interfaces for high-throughput production of single and compound emulsion droplets, having controlled sizes and internal compositions. The production module consists of two distinct elements: a planar square chip on which many copies of a microfluidic droplet generator (MFDG) are arranged circularly, and a cubic supporting module with coaxial annular channels for supplying fluids evenly to the inlets of the mounted chip, assembled from blocks with cylinders and holes. Three-dimensional flow was simulated to evaluate the distribution of flow velocity in the coaxial multiple annular channels. By coupling a 1.5 cm × 1.5 cm microfluidic chip with parallelized 144 MFDGs and a supporting module with two annular channels, for example, we could produce simple oil-in-water (O/W) emulsion droplets having a mean diameter of 90.7 μm and a coefficient of variation (CV) of 2.2% at a throughput of 180.0 mL h(-1). Furthermore, we successfully demonstrated high-throughput production of Janus droplets, double emulsions and triple emulsions, by coupling 1.5 cm × 1.5 cm - 4.5 cm × 4.5 cm microfluidic chips with parallelized 32-128 MFDGs of various geometries and supporting modules with 3-4 annular channels.
Self Assembly and Interface Engineering of Organic Functional Materials for High Performance Polymer Solar Cells

NASA Astrophysics Data System (ADS)

Jen, Alex

2010-03-01

The performance of polymer solar cells are strongly dependent on the efficiency of light harvesting, exciton dissociation, charge transport, and charge collection at the metal/organic, metal/metal oxide, and organic/metal oxide interfaces. To improve the device performance, two parallel approaches were used: 1) developing novel low band gap conjugated polymers with good charge-transporting properties and 2) modifying the interfaces between the organic/metal oxide and organic/metal layers with functional self-assembling monolayers to tune their energy barriers. Moreover, the molecule engineering approach was also used to tune the energy level, charge mobility, and morphology of organic semiconductors.
Interface composition of InAs nanowires with Al2O3 and HfO2 thin films

NASA Astrophysics Data System (ADS)

Timm, R.; Hjort, M.; Fian, A.; Borg, B. M.; Thelander, C.; Andersen, J. N.; Wernersson, L.-E.; Mikkelsen, A.

2011-11-01

Vertical InAs nanowires (NWs) wrapped by a thin high-κ dielectric layer may be a key to the next generation of high-speed metal-oxide-semiconductor devices. Here, we have investigated the structure and chemical composition of the interface between InAs NWs and 2 nm thick Al2O3 and HfO2 films. The native oxide on the NWs is significantly reduced upon high-κ deposition, although less effective than for corresponding planar samples, resulting in a 0.8 nm thick interface layer with an In-/As-oxide composition of about 0.7/0.3. The exact oxide reduction and composition including As-suboxides and the role of the NW geometry are discussed in detail.
Parallel Fin ORU Thermal Interface for space applications. [Orbital Replaceable Unit

NASA Technical Reports Server (NTRS)

Stobb, C. A.; Limardo, Jose G.

1992-01-01

The Parallel Fin Thermal Interface has been developed as an Orbital Replaceable Unit (ORU) interface. The interface transfers heat from an ORU baseplate to a Heat Acquisition Plate (HAP) through pairs of fins sandwiched between insert plates that press against the fins with uniform pressure. The insert plates are spread apart for ORU baseplate separation and replacement. Two prototype interfaces with different fin dimensions were built (Model 140 and 380). Interfacing surface samples were found to have roughnesses of 56 to 89 nm. Conductance values of 267 to 420 W/sq m C were obtained for the 140 model in vacuum with interface pressures of 131 to 262 kPa (19 to 38 psi). Vacuum conductances ranging from 176 to 267 W/sq m F were obtained for the 380 model at interface pressures of 97 to 152 kPa (14 and 22 psi). Correlations from several sources were found to agree with test data within 20 percent using thermal math models of the interfaces.
Multiplexed Oversampling Digitizer in 65 nm CMOS for Column-Parallel CCD Readout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grace, Carl; Walder, Jean-Pierre; von der Lippe, Henrik

2012-04-10

A digitizer designed to read out column-parallel charge-coupled devices (CCDs) used for high-speed X-ray imaging is presented. The digitizer is included as part of the High-Speed Image Preprocessor with Oversampling (HIPPO) integrated circuit. The digitizer module comprises a multiplexed, oversampling, 12-bit, 80 MS/s pipelined Analog-to-Digital Converter (ADC) and a bank of four fast-settling sample-and-hold amplifiers to instrument four analog channels. The ADC multiplexes and oversamples to reduce its area to allow integration that is pitch-matched to the columns of the CCD. Novel design techniques are used to enable oversampling and multiplexing with a reduced power penalty. The ADC exhibits 188more » ?V-rms noise which is less than 1 LSB at a 12-bit level. The prototype is implemented in a commercially available 65 nm CMOS process. The digitizer will lead to a proof-of-principle 2D 10 Gigapixel/s X-ray detector.« less
The numerical simulation of a high-speed axial flow compressor

NASA Technical Reports Server (NTRS)

Mulac, Richard A.; Adamczyk, John J.

1991-01-01

The advancement of high-speed axial-flow multistage compressors is impeded by a lack of detailed flow-field information. Recent development in compressor flow modeling and numerical simulation have the potential to provide needed information in a timely manner. The development of a computer program is described to solve the viscous form of the average-passage equation system for multistage turbomachinery. Programming issues such as in-core versus out-of-core data storage and CPU utilization (parallelization, vectorization, and chaining) are addressed. Code performance is evaluated through the simulation of the first four stages of a five-stage, high-speed, axial-flow compressor. The second part addresses the flow physics which can be obtained from the numerical simulation. In particular, an examination of the endwall flow structure is made, and its impact on blockage distribution assessed.
High-speed massively parallel scanning

DOEpatents

Decker, Derek E [Byron, CA

2010-07-06

A new technique for recording a series of images of a high-speed event (such as, but not limited to: ballistics, explosives, laser induced changes in materials, etc.) is presented. Such technique(s) makes use of a lenslet array to take image picture elements (pixels) and concentrate light from each pixel into a spot that is much smaller than the pixel. This array of spots illuminates a detector region (e.g., film, as one embodiment) which is scanned transverse to the light, creating tracks of exposed regions. Each track is a time history of the light intensity for a single pixel. By appropriately configuring the array of concentrated spots with respect to the scanning direction of the detection material, different tracks fit between pixels and sufficient lengths are possible which can be of interest in several high-speed imaging applications.
DynaSim: A MATLAB Toolbox for Neural Modeling and Simulation

PubMed Central

Sherfey, Jason S.; Soplata, Austin E.; Ardid, Salva; Roberts, Erik A.; Stanley, David A.; Pittman-Polletta, Benjamin R.; Kopell, Nancy J.

2018-01-01

DynaSim is an open-source MATLAB/GNU Octave toolbox for rapid prototyping of neural models and batch simulation management. It is designed to speed up and simplify the process of generating, sharing, and exploring network models of neurons with one or more compartments. Models can be specified by equations directly (similar to XPP or the Brian simulator) or by lists of predefined or custom model components. The higher-level specification supports arbitrarily complex population models and networks of interconnected populations. DynaSim also includes a large set of features that simplify exploring model dynamics over parameter spaces, running simulations in parallel using both multicore processors and high-performance computer clusters, and analyzing and plotting large numbers of simulated data sets in parallel. It also includes a graphical user interface (DynaSim GUI) that supports full functionality without requiring user programming. The software has been implemented in MATLAB to enable advanced neural modeling using MATLAB, given its popularity and a growing interest in modeling neural systems. The design of DynaSim incorporates a novel schema for model specification to facilitate future interoperability with other specifications (e.g., NeuroML, SBML), simulators (e.g., NEURON, Brian, NEST), and web-based applications (e.g., Geppetto) outside MATLAB. DynaSim is freely available at http://dynasimtoolbox.org. This tool promises to reduce barriers for investigating dynamics in large neural models, facilitate collaborative modeling, and complement other tools being developed in the neuroinformatics community. PMID:29599715
DynaSim: A MATLAB Toolbox for Neural Modeling and Simulation.

PubMed

Sherfey, Jason S; Soplata, Austin E; Ardid, Salva; Roberts, Erik A; Stanley, David A; Pittman-Polletta, Benjamin R; Kopell, Nancy J

2018-01-01

DynaSim is an open-source MATLAB/GNU Octave toolbox for rapid prototyping of neural models and batch simulation management. It is designed to speed up and simplify the process of generating, sharing, and exploring network models of neurons with one or more compartments. Models can be specified by equations directly (similar to XPP or the Brian simulator) or by lists of predefined or custom model components. The higher-level specification supports arbitrarily complex population models and networks of interconnected populations. DynaSim also includes a large set of features that simplify exploring model dynamics over parameter spaces, running simulations in parallel using both multicore processors and high-performance computer clusters, and analyzing and plotting large numbers of simulated data sets in parallel. It also includes a graphical user interface (DynaSim GUI) that supports full functionality without requiring user programming. The software has been implemented in MATLAB to enable advanced neural modeling using MATLAB, given its popularity and a growing interest in modeling neural systems. The design of DynaSim incorporates a novel schema for model specification to facilitate future interoperability with other specifications (e.g., NeuroML, SBML), simulators (e.g., NEURON, Brian, NEST), and web-based applications (e.g., Geppetto) outside MATLAB. DynaSim is freely available at http://dynasimtoolbox.org. This tool promises to reduce barriers for investigating dynamics in large neural models, facilitate collaborative modeling, and complement other tools being developed in the neuroinformatics community.
OpenMP parallelization of a gridded SWAT (SWATG)

NASA Astrophysics Data System (ADS)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Design of a high-speed digital processing element for parallel simulation

NASA Technical Reports Server (NTRS)

Milner, E. J.; Cwynar, D. S.

1983-01-01

A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
GPU Particle Tracking and MHD Simulations with Greatly Enhanced Computational Speed

NASA Astrophysics Data System (ADS)

Ziemba, T.; O'Donnell, D.; Carscadden, J.; Cash, M.; Winglee, R.; Harnett, E.

2008-12-01

GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for less cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU, and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. 3-D particle tracking and MHD codes have been developed using NVIDIA's CUDA and have demonstrated speed up of nearly a factor of 20 over equivalent CPU versions of the codes. Such a speed up enables new applications to develop, including real time running of radiation belt simulations and real time running of global magnetospheric simulations, both of which could provide important space weather prediction tools.

Imaging photomultiplier array with integrated amplifiers and high-speed USB interfacea)

NASA Astrophysics Data System (ADS)

Blacksell, M.; Wach, J.; Anderson, D.; Howard, J.; Collis, S. M.; Blackwell, B. D.; Andruczyk, D.; James, B. W.

2008-10-01

Multianode photomultiplier tube (PMT) arrays are finding application as convenient high-speed light sensitive devices for plasma imaging. This paper describes the development of a USB-based "plug-n-play" 16-channel PMT camera with 16bits simultaneous acquisition of 16 signal channels at rates up to 2MS/s per channel. The preamplifiers and digital hardware are packaged in a compact housing which incorporates magnetic shielding, on-board generation of the high-voltage PMT bias, an optical filter mount and slits, and F-mount lens adaptor. Triggering, timing, and acquisition are handled by four field-programmable gate arrays (FPGAs) under instruction from a master FPGA controlled by a computer with a LABVIEW interface. We present technical design details and specifications and illustrate performance with high-speed images obtained on the H-1 heliac at the ANU.
A Comparison of Lifting-Line and CFD Methods with Flight Test Data from a Research Puma Helicopter

NASA Technical Reports Server (NTRS)

Bousman, William G.; Young, Colin; Toulmay, Francois; Gilbert, Neil E.; Strawn, Roger C.; Miller, Judith V.; Maier, Thomas H.; Costes, Michel; Beaumier, Philippe

1996-01-01

Four lifting-line methods were compared with flight test data from a research Puma helicopter and the accuracy assessed over a wide range of flight speeds. Hybrid Computational Fluid Dynamics (CFD) methods were also examined for two high-speed conditions. A parallel analytical effort was performed with the lifting-line methods to assess the effects of modeling assumptions and this provided insight into the adequacy of these methods for load predictions.
Adaptive efficient compression of genomes

PubMed Central

2012-01-01

Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. PMID:23146997
Effect of dual laser beam on dissimilar welding-brazing of aluminum to galvanized steel

NASA Astrophysics Data System (ADS)

Mohammadpour, Masoud; Yazdian, Nima; Yang, Guang; Wang, Hui-Ping; Carlson, Blair; Kovacevic, Radovan

2018-01-01

In this investigation, the joining of two types of galvanized steel and Al6022 aluminum alloy in a coach peel configuration was carried out using a laser welding-brazing process in dual-beam mode. The feasibility of this method to obtain a sound and uniform brazed bead with high surface quality at a high welding speed was investigated by employing AlSi12 as a consumable material. The effects of alloying elements on the thickness of intermetallic compound (IMC) produced at the interface of steel and aluminum, surface roughness, edge straightness and the tensile strength of the resultant joint were studied. The comprehensive study was conducted on the microstructure of joints by means of a scanning electron microscopy and EDS. Results showed that a dual-beam laser shape and high scanning speed could control the thickness of IMC as thin as 3 μm and alter the failure location from the steel-brazed interface toward the Al-brazed interface. The numerical simulation of thermal regime was conducted by the Finite Element Method (FEM), and simulation results were validated through comparative experimental data. FEM thermal modeling evidenced that the peak temperatures at the Al-steel interface were around the critical temperature range of 700-900 °C that is required for the highest growth rate of IMC. However, the time duration that the molten pool was placed inside this temperature range was less than 1 s, and this duration was too short for diffusion-control based IMC growth.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.

PubMed

Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi

2018-01-01

Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
High-Speed 3D Printing of High-Performance Thermosetting Polymers via Two-Stage Curing.

PubMed

Kuang, Xiao; Zhao, Zeang; Chen, Kaijuan; Fang, Daining; Kang, Guozheng; Qi, Hang Jerry

2018-04-01

Design and direct fabrication of high-performance thermosets and composites via 3D printing are highly desirable in engineering applications. Most 3D printed thermosetting polymers to date suffer from poor mechanical properties and low printing speed. Here, a novel ink for high-speed 3D printing of high-performance epoxy thermosets via a two-stage curing approach is presented. The ink containing photocurable resin and thermally curable epoxy resin is used for the digital light processing (DLP) 3D printing. After printing, the part is thermally cured at elevated temperature to yield an interpenetrating polymer network epoxy composite, whose mechanical properties are comparable to engineering epoxy. The printing speed is accelerated by the continuous liquid interface production assisted DLP 3D printing method, achieving a printing speed as high as 216 mm h -1 . It is also demonstrated that 3D printing structural electronics can be achieved by combining the 3D printed epoxy composites with infilled silver ink in the hollow channels. The new 3D printing method via two-stage curing combines the attributes of outstanding printing speed, high resolution, low volume shrinkage, and excellent mechanical properties, and provides a new avenue to fabricate 3D thermosetting composites with excellent mechanical properties and high efficiency toward high-performance and functional applications. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Different Relative Orientation of Static and Alternative Magnetic Fields and Cress Roots Direction of Growth Changes Their Gravitropic Reaction

NASA Astrophysics Data System (ADS)

Sheykina, Nadiia; Bogatina, Nina

The following variants of roots location relatively to static and alternative components of magnetic field were studied. At first variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed perpendicular to both two fields’ components and gravitation vector. At the variant the negative gravitropysm for cress roots was observed. At second variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed parallel to alternative magnetic field. At third variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed perpendicular to both two fields components and gravitation vector; At forth variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed parallel to static magnetic field. In all cases studied the alternative magnetic field frequency was equal to Ca ions cyclotron frequency. In 2, 3 and 4 variants gravitropism was positive. But the gravitropic reaction speeds were different. In second and forth variants the gravitropic reaction speed in error limits coincided with the gravitropic reaction speed under Earth’s conditions. At third variant the gravitropic reaction speed was slowed essentially.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Comparison of cavity preparation quality using an electric motor handpiece and an air turbine dental handpiece.

PubMed

Kenyon, Brian J; Van Zyl, Ian; Louie, Kenneth G

2005-08-01

The high-speed high-torque (electric motor) handpiece is becoming more popular in dental offices and laboratories in the United States. It is reported to cut more precisely and to assist in the creation of finer margins that enhance cavity preparations. The authors conducted an in vitro study to compare the quality of cavity preparations fabricated with a high-speed high-torque (electric motor) handpiece and a high-speed low-torque (air turbine) handpiece. Eighty-six dental students each cut two Class I preparations, one with an air turbine handpiece and the other with an electric motor high-speed handpiece. The authors asked the students to cut each preparation accurately to a circular outline and to establish a flat pulpal floor with 1.5 millimeters' depth, 90-degree exit angles, parallel vertical walls and sharp internal line angles, as well as to refine the preparation to achieve flat, smooth walls with a well-defined cavosurface margin. A single faculty member scored the preparations for criteria and refinement using a nine-point scale (range, 1-9). The authors analyzed the data statistically using paired t tests. In preparation criteria, the electric motor high-speed handpiece had a higher average grade than did the air turbine handpiece (5.07 and 4.90, respectively). For refinement, the average grade for the air turbine high-speed handpiece was greater than that for the electric motor high-speed handpiece (5.72 and 5.52, respectively). The differences were not statistically significant. The electric motor high-speed handpiece performed as well as, but not better than, the air turbine handpiece in the fabrication of high-quality cavity preparations.
Considerations on the Use of Custom Accelerators for Big Data Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Tumeo, Antonino; Minutoli, Marco

Accelerators, including Graphic Processing Units (GPUs) for gen- eral purpose computation, many-core designs with wide vector units (e.g., Intel Phi), have become a common component of many high performance clusters. The appearance of more stable and reliable tools tools that can automatically convert code written in high-level specifications with annotations (such as C or C++) to hardware de- scription languages (High-Level Synthesis - HLS), is also setting the stage for a broader use of reconfigurable devices (e.g., Field Pro- grammable Gate Arrays - FPGAs) in high performance system for the implementation of custom accelerators, helped by the fact that newmore » processors include advanced cache-coherent interconnects for these components. In this chapter, we briefly survey the status of the use of accelerators in high performance systems targeted at big data analytics applications. We argue that, although the progress in the use of accelerators for this class of applications has been sig- nificant, differently from scientific simulations there still are gaps to close. This is particularly true for the ”irregular” behaviors exhibited by no-SQL, graph databases. We focus our attention on the limits of HLS tools for data analytics and graph methods, and discuss a new architectural template that better fits the requirement of this class of applications. We validate the new architectural templates by mod- ifying the Graph Engine for Multithreaded System (GEMS) frame- work to support accelerators generated with such a methodology, and testing with queries coming from the Lehigh University Benchmark (LUBM). The architectural template enables better supporting the task and memory level parallelism present in graph methods by sup- porting a new control model and a enhanced memory interface. We show that out solution allows generating parallel accelerators, pro- viding speed ups with respect to conventional HLS flows. We finally draw conclusions and present a perspective on the use of reconfig- urable devices and Design Automation tools for data analytics.« less
GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening

NASA Astrophysics Data System (ADS)

Maureira-Fredes, Cristián; Amaro-Seoane, Pau

2018-01-01

A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.
Sharp organic interface of molecular C60 chains and a pentacene derivative SAM on Au(788): A combined STM & DFT study

NASA Astrophysics Data System (ADS)

Wang, Jun; Tang, Jian-Ming; Larson, Amanda M.; Miller, Glen P.; Pohl, Karsten

2013-12-01

Controlling the molecular structure of the donor-acceptor interface is essential to overcoming the efficiency bottleneck in organic photovoltaics. We present a study of self-assembled fullerene (C60) molecular chains on perfectly ordered 6,13-dichloropentacene (DCP) monolayers forming on a vicinal Au(788) surface using scanning tunneling microscopy in conjunction with density functional theory calculations. DCP is a novel pentacene derivative optimized for photovoltaic applications. The molecules form a brick-wall patterned centered rectangular lattice with the long axis parallel to the monatomic steps that separate the 3.9 nm wide Au(111) terraces. The strong interaction between the C60 molecules and the gold substrate is well screened by the DCP monolayer. At submonolayer C60 coverage, the fullerene molecules form long parallel chains, 1.1 nm apart, with a rectangular arrangement instead of the expected close-packed configuration along the upper step edges. The perfectly ordered DCP structure is unaffected by the C60 chain formation. The controlled sharp highly-ordered organic interface has the potential to improve the conversion efficiency in organic photovoltaics.
P43-S Computational Biology Applications Suite for High-Performance Computing (BioHPC.net)

PubMed Central

Pillardy, J.

2007-01-01

One of the challenges of high-performance computing (HPC) is user accessibility. At the Cornell University Computational Biology Service Unit, which is also a Microsoft HPC institute, we have developed a computational biology application suite that allows researchers from biological laboratories to submit their jobs to the parallel cluster through an easy-to-use Web interface. Through this system, we are providing users with popular bioinformatics tools including BLAST, HMMER, InterproScan, and MrBayes. The system is flexible and can be easily customized to include other software. It is also scalable; the installation on our servers currently processes approximately 8500 job submissions per year, many of them requiring massively parallel computations. It also has a built-in user management system, which can limit software and/or database access to specified users. TAIR, the major database of the plant model organism Arabidopsis, and SGN, the international tomato genome database, are both using our system for storage and data analysis. The system consists of a Web server running the interface (ASP.NET C#), Microsoft SQL server (ADO.NET), compute cluster running Microsoft Windows, ftp server, and file server. Users can interact with their jobs and data via a Web browser, ftp, or e-mail. The interface is accessible at http://cbsuapps.tc.cornell.edu/.
Effect of Gold on the Microstructural Evolution and Integrity of a Sintered Silver Joint

NASA Astrophysics Data System (ADS)

Muralidharan, Govindarajan; Leonard, Donovan N.; Meyer, Harry M.

2017-07-01

There is a need for next-generation, high-performance power electronic packages and systems employing wide-bandgap devices to operate at high temperatures in automotive and electric grid applications. Sintered silver joints are currently being evaluated as an alternative to Pb-free solder joints. Of particular interest is the development of joints based on silver paste consisting of nano- or micron-scale particles that can be processed without application of external pressure. The microstructural evolution at the interface of a pressureless-sintered silver joint formed between a SiC die with Ti/Ni/Au metallization and an active metal brazed (AMB) substrate with Ag metallization at 250°C has been evaluated using scanning electron microscopy (SEM), x-ray microanalysis, and x-ray photoelectron spectroscopy (XPS). Results from focused ion beam (FIB) cross-sections show that, during sintering, pores in the sintered region near to the Au layer tend to be narrow and elongated with long axis oriented parallel to the interface. Further densification results in formation of many small, relatively equiaxed pores aligned parallel to the interface, creating a path for easy crack propagation. X-ray microanalysis results confirm interdiffusion between Au and Ag and that a region with poor mechanical strength is formed at the edge of this region of interdiffusion.
Rough Electrode Creates Excess Capacitance in Thin-Film Capacitors

PubMed Central

2017-01-01

The parallel-plate capacitor equation is widely used in contemporary material research for nanoscale applications and nanoelectronics. To apply this equation, flat and smooth electrodes are assumed for a capacitor. This essential assumption is often violated for thin-film capacitors because the formation of nanoscale roughness at the electrode interface is very probable for thin films grown via common deposition methods. In this work, we experimentally and theoretically show that the electrical capacitance of thin-film capacitors with realistic interface roughness is significantly larger than the value predicted by the parallel-plate capacitor equation. The degree of the deviation depends on the strength of the roughness, which is described by three roughness parameters for a self-affine fractal surface. By applying an extended parallel-plate capacitor equation that includes the roughness parameters of the electrode, we are able to calculate the excess capacitance of the electrode with weak roughness. Moreover, we introduce the roughness parameter limits for which the simple parallel-plate capacitor equation is sufficiently accurate for capacitors with one rough electrode. Our results imply that the interface roughness beyond the proposed limits cannot be dismissed unless the independence of the capacitance from the interface roughness is experimentally demonstrated. The practical protocols suggested in our work for the reliable use of the parallel-plate capacitor equation can be applied as general guidelines in various fields of interest. PMID:28745040
Rough Electrode Creates Excess Capacitance in Thin-Film Capacitors.

PubMed

Torabi, Solmaz; Cherry, Megan; Duijnstee, Elisabeth A; Le Corre, Vincent M; Qiu, Li; Hummelen, Jan C; Palasantzas, George; Koster, L Jan Anton

2017-08-16

The parallel-plate capacitor equation is widely used in contemporary material research for nanoscale applications and nanoelectronics. To apply this equation, flat and smooth electrodes are assumed for a capacitor. This essential assumption is often violated for thin-film capacitors because the formation of nanoscale roughness at the electrode interface is very probable for thin films grown via common deposition methods. In this work, we experimentally and theoretically show that the electrical capacitance of thin-film capacitors with realistic interface roughness is significantly larger than the value predicted by the parallel-plate capacitor equation. The degree of the deviation depends on the strength of the roughness, which is described by three roughness parameters for a self-affine fractal surface. By applying an extended parallel-plate capacitor equation that includes the roughness parameters of the electrode, we are able to calculate the excess capacitance of the electrode with weak roughness. Moreover, we introduce the roughness parameter limits for which the simple parallel-plate capacitor equation is sufficiently accurate for capacitors with one rough electrode. Our results imply that the interface roughness beyond the proposed limits cannot be dismissed unless the independence of the capacitance from the interface roughness is experimentally demonstrated. The practical protocols suggested in our work for the reliable use of the parallel-plate capacitor equation can be applied as general guidelines in various fields of interest.
Measuring Speed Using a Computer--Several Techniques.

ERIC Educational Resources Information Center

Pearce, Jon M.

1988-01-01

Introduces three different techniques to facilitate the measurement of speed and the associated kinematics and dynamics using a computer. Discusses sensing techniques using optical or ultrasonic sensors, interfacing with a computer, software routines for the interfaces, and other applications. Provides circuit diagrams, pictures, and a program to…
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

PubMed Central

Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

2010-01-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Interferometric imaging of acoustical phenomena using high-speed polarization camera and 4-step parallel phase-shifting technique

NASA Astrophysics Data System (ADS)

Ishikawa, K.; Yatabe, K.; Ikeda, Y.; Oikawa, Y.; Onuma, T.; Niwa, H.; Yoshii, M.

2017-02-01

Imaging of sound aids the understanding of the acoustical phenomena such as propagation, reflection, and diffraction, which is strongly required for various acoustical applications. The imaging of sound is commonly done by using a microphone array, whereas optical methods have recently been interested due to its contactless nature. The optical measurement of sound utilizes the phase modulation of light caused by sound. Since light propagated through a sound field changes its phase as proportional to the sound pressure, optical phase measurement technique can be used for the sound measurement. Several methods including laser Doppler vibrometry and Schlieren method have been proposed for that purpose. However, the sensitivities of the methods become lower as a frequency of sound decreases. In contrast, since the sensitivities of the phase-shifting technique do not depend on the frequencies of sounds, that technique is suitable for the imaging of sounds in the low-frequency range. The principle of imaging of sound using parallel phase-shifting interferometry was reported by the authors (K. Ishikawa et al., Optics Express, 2016). The measurement system consists of a high-speed polarization camera made by Photron Ltd., and a polarization interferometer. This paper reviews the principle briefly and demonstrates the high-speed imaging of acoustical phenomena. The results suggest that the proposed system can be applied to various industrial problems in acoustical engineering.
The science of computing - The evolution of parallel processing

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.

User's Guide for ENSAERO_FE Parallel Finite Element Solver

NASA Technical Reports Server (NTRS)

Eldred, Lloyd B.; Guruswamy, Guru P.

1999-01-01

A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.
Big data driven cycle time parallel prediction for production planning in wafer manufacturing

NASA Astrophysics Data System (ADS)

Wang, Junliang; Yang, Jungang; Zhang, Jie; Wang, Xiaoxi; Zhang, Wenjun Chris

2018-07-01

Cycle time forecasting (CTF) is one of the most crucial issues for production planning to keep high delivery reliability in semiconductor wafer fabrication systems (SWFS). This paper proposes a novel data-intensive cycle time (CT) prediction system with parallel computing to rapidly forecast the CT of wafer lots with large datasets. First, a density peak based radial basis function network (DP-RBFN) is designed to forecast the CT with the diverse and agglomerative CT data. Second, the network learning method based on a clustering technique is proposed to determine the density peak. Third, a parallel computing approach for network training is proposed in order to speed up the training process with large scaled CT data. Finally, an experiment with respect to SWFS is presented, which demonstrates that the proposed CTF system can not only speed up the training process of the model but also outperform the radial basis function network, the back-propagation-network and multivariate regression methodology based CTF methods in terms of the mean absolute deviation and standard deviation.
Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

NASA Astrophysics Data System (ADS)

Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

2018-01-01

The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
Trajectory Tracking of a Planer Parallel Manipulator by Using Computed Force Control Method

NASA Astrophysics Data System (ADS)

Bayram, Atilla

2017-03-01

Despite small workspace, parallel manipulators have some advantages over their serial counterparts in terms of higher speed, acceleration, rigidity, accuracy, manufacturing cost and payload. Accordingly, this type of manipulators can be used in many applications such as in high-speed machine tools, tuning machine for feeding, sensitive cutting, assembly and packaging. This paper presents a special type of planar parallel manipulator with three degrees of freedom. It is constructed as a variable geometry truss generally known planar Stewart platform. The reachable and orientation workspaces are obtained for this manipulator. The inverse kinematic analysis is solved for the trajectory tracking according to the redundancy and joint limit avoidance. Then, the dynamics model of the manipulator is established by using Virtual Work method. The simulations are performed to follow the given planar trajectories by using the dynamic equations of the variable geometry truss manipulator and computed force control method. In computed force control method, the feedback gain matrices for PD control are tuned with fixed matrices by trail end error and variable ones by means of optimization with genetic algorithm.
Evaporating Spray in Supersonic Streams Including Turbulence Effects

NASA Technical Reports Server (NTRS)

Balasubramanyam, M. S.; Chen, C. P.

2006-01-01

Evaporating spray plays an important role in spray combustion processes. This paper describes the development of a new finite-conductivity evaporation model, based on the two-temperature film theory, for two-phase numerical simulation using Eulerian-Lagrangian method. The model is a natural extension of the T-blob/T-TAB atomization/spray model which supplies the turbulence characteristics for estimating effective thermal diffusivity within the droplet phase. Both one-way and two-way coupled calculations were performed to investigate the performance of this model. Validation results indicate the superiority of the finite-conductivity model in low speed parallel flow evaporating sprays. High speed cross flow spray results indicate the effectiveness of the T-blob/T-TAB model and point to the needed improvements in high speed evaporating spray modeling.
An immersed boundary formulation for simulating high-speed compressible viscous flows with moving solids

NASA Astrophysics Data System (ADS)

Qu, Yegao; Shi, Ruchao; Batra, Romesh C.

2018-02-01

We present a robust sharp-interface immersed boundary method for numerically studying high speed flows of compressible and viscous fluids interacting with arbitrarily shaped either stationary or moving rigid solids. The Navier-Stokes equations are discretized on a rectangular Cartesian grid based on a low-diffusion flux splitting method for inviscid fluxes and conservative high-order central-difference schemes for the viscous components. Discontinuities such as those introduced by shock waves and contact surfaces are captured by using a high-resolution weighted essentially non-oscillatory (WENO) scheme. Ghost cells in the vicinity of the fluid-solid interface are introduced to satisfy boundary conditions on the interface. Values of variables in the ghost cells are found by using a constrained moving least squares method (CMLS) that eliminates numerical instabilities encountered in the conventional MLS formulation. The solution of the fluid flow and the solid motion equations is advanced in time by using the third-order Runge-Kutta and the implicit Newmark integration schemes, respectively. The performance of the proposed method has been assessed by computing results for the following four problems: shock-boundary layer interaction, supersonic viscous flows past a rigid cylinder, moving piston in a shock tube and lifting off from a flat surface of circular, rectangular and elliptic cylinders triggered by shock waves, and comparing computed results with those available in the literature.
SWMM5 Application Programming Interface and PySWMM: A Python Interfacing Wrapper

EPA Science Inventory

In support of the OpenWaterAnalytics open source initiative, the PySWMM project encompasses the development of a Python interfacing wrapper to SWMM5 with parallel ongoing development of the USEPA Stormwater Management Model (SWMM5) application programming interface (API). ...
Adsorption of 1- and 2-butylimidazoles at the copper/air and steel/air interfaces studied by sum frequency generation vibrational spectroscopy.

PubMed

Casford, Michael T L; Davies, Paul B

2012-07-24

The structure of thin films of 1- and 2-butylimidazoles adsorbed on copper and steel surfaces under air was examined using sum frequency generation (SFG) vibrational spectroscopy in the ppp and ssp polarizations. Additionally, the SFG spectra of both isomers were recorded at 55 °C at the liquid imidazole/air interface for reference. Complementary bulk infrared, reflection-absorption infrared spectroscopy (RAIRS), and Raman spectra of both imidazoles were recorded for assignment purposes. The SFG spectra in the C-H stretching region at the liquid/air interface are dominated by resonances from the methyl end group of the butyl side chain of the imidazoles, indicating that they are aligned parallel or closely parallel to the surface normal. These are also the most prominent features in the SFG spectra on copper and steel. In addition, both the ppp and ssp spectra on copper show resonances from the C-H stretching modes of the imidazole ring for both isomers. The ring C-H resonances are completely absent from the spectra on steel and at the liquid/air interface. The relative intensities of the SFG spectra can be interpreted as showing that, on copper, under air, both butylimidazoles are adsorbed with their butyl side chains perpendicular to the interface and with the ring significantly inclined away from the surface plane and toward the surface normal. The SFG spectra of both imidazoles on steel indicate an orientation where the imidazole rings are parallel or nearly parallel to the surface. The weak C-H resonances from the ring at the liquid/air interface suggest that the tilt angle of the ring from the surface normal at this interface is significantly greater than it is on copper.
Parallelization of the Flow Field Dependent Variation Scheme for Solving the Triple Shock/Boundary Layer Interaction Problem

NASA Technical Reports Server (NTRS)

Schunk, Richard Gregory; Chung, T. J.

2001-01-01

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.

PubMed Central

Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W

1996-01-01

Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
SODR Memory Control Buffer Control ASIC

NASA Technical Reports Server (NTRS)

Hodson, Robert F.

1994-01-01

The Spacecraft Optical Disk Recorder (SODR) is a state of the art mass storage system for future NASA missions requiring high transmission rates and a large capacity storage system. This report covers the design and development of an SODR memory buffer control applications specific integrated circuit (ASIC). The memory buffer control ASIC has two primary functions: (1) buffering data to prevent loss of data during disk access times, (2) converting data formats from a high performance parallel interface format to a small computer systems interface format. Ten 144 p in, 50 MHz CMOS ASIC's were designed, fabricated and tested to implement the memory buffer control function.
A temperature controller board for the ARC controller

NASA Astrophysics Data System (ADS)

Tulloch, Simon

2016-07-01

A high-performance temperature controller board has been produced for the ARC Generation-3 CCD controller. It contains two 9W temperature servo loops and four temperature input channels and is fully programmable via the ARC API and OWL data acquisition program. PI-loop control is implemented in an on-board micro. Both diode and RTD sensors can be used. Control and telemetry data is sent via the ARC backplane although a USB-2 interface is also available. Further functionality includes hardware timers and high current drivers for external shutters and calibration LEDs, an LCD display, a parallel i/o port, a pressure sensor interface and an uncommitted analogue telemetry input.
SPLASSH: Open source software for camera-based high-speed, multispectral in-vivo optical image acquisition

PubMed Central

Sun, Ryan; Bouchard, Matthew B.; Hillman, Elizabeth M. C.

2010-01-01

Camera-based in-vivo optical imaging can provide detailed images of living tissue that reveal structure, function, and disease. High-speed, high resolution imaging can reveal dynamic events such as changes in blood flow and responses to stimulation. Despite these benefits, commercially available scientific cameras rarely include software that is suitable for in-vivo imaging applications, making this highly versatile form of optical imaging challenging and time-consuming to implement. To address this issue, we have developed a novel, open-source software package to control high-speed, multispectral optical imaging systems. The software integrates a number of modular functions through a custom graphical user interface (GUI) and provides extensive control over a wide range of inexpensive IEEE 1394 Firewire cameras. Multispectral illumination can be incorporated through the use of off-the-shelf light emitting diodes which the software synchronizes to image acquisition via a programmed microcontroller, allowing arbitrary high-speed illumination sequences. The complete software suite is available for free download. Here we describe the software’s framework and provide details to guide users with development of this and similar software. PMID:21258475
Long waves in parallel flow in Hele-Shaw cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zeybek, M.; Yortsos, Y.C.

During the past several years the flow of immiscible flow in Hele-Shaw cells and porous media has been investigated extensively. Of particular interest to most studies has been frontal displacement, specifically viscous fingering instabilities and finger growth. The practical ramifications regarding oil recovery, as well as many other industrial processes in porous media, have served as the primary driving force for most of these investigations. By contrast, little attention has been paid to the motion of lateral fluid interface, which are parallel to the main flow direction. Parallel flow is an often encountered, although much overlooked regime. The evolution ofmore » fluid interfaces in parallel flow in Hele-Shaw cells is studied both theoretically and experimentally in the large capillary number limit. It is shown that such interfaces support wave motion, the amplitude of which for long waves is governed by the KdV equation. Experiments are conducted in a long Hele-Shaw cell that validate the theory in the symmetric case. 35 refs., 16 figs.« less
Optical measurement of interface movements of liquid metal excited by a pneumatic shaker

NASA Astrophysics Data System (ADS)

Men, Shouqiang; Zhou, Jun; Xu, Jingwen

2015-02-01

A model experiment was designed, and Faraday instabilities were generated in a plexiglass cylinder excited by a pneumatic shaker. A contacting distance meter and a single-point fiber-optic vibrometer were applied to measure the displacement/velocity of the shaker, both of the results are in good agreement with each other. Besides, the fibre-optic laser vibrometer was exploited to measure the velocity of the interface between potassium hydroxide aqueous solution and Galinstan. It shows that the fibre-optic vibrometer can be applied to measure the interface movements without Faraday instabilities, whereas there are strong scatter and the interface displacement can only be obtained qualitatively. In this case, a scanning vibrometer or a high-speed CCD camera should be used to record the interface movements.
Snap-in of particles at curved liquid interfaces

NASA Astrophysics Data System (ADS)

Li, Chao; Moradiafrapoli, Momene; Marston, Jeremy

2016-11-01

The contact of particles with liquid interfaces constitutes the first stage in the formation of a particle-laden interface, the so-called "snap-in effect". Here, we report on an experimental study using high-speed video to directly visualize the snap-in process and the approach to the equilibrium state of a particle at a curved liquid interface (i.e. droplet surface). We image the evolution of the contact line, which is found to follow a power-law scaling in time, and the dynamic contact angle during the snap-in. Both hydrophilic and hydrophobic particles are explored and we match the lift-off stage of the particles with a simple force balance. We also explore some multi-particle experiments, eluding to the dynamics of particle-laden interface formation.
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

NASA Astrophysics Data System (ADS)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Method and apparatus for data sampling

DOEpatents

Odell, Daniel M. C.

1994-01-01

A method and apparatus for sampling radiation detector outputs and determining event data from the collected samples. The method uses high speed sampling of the detector output, the conversion of the samples to digital values, and the discrimination of the digital values so that digital values representing detected events are determined. The high speed sampling and digital conversion is performed by an A/D sampler that samples the detector output at a rate high enough to produce numerous digital samples for each detected event. The digital discrimination identifies those digital samples that are not representative of detected events. The sampling and discrimination also provides for temporary or permanent storage, either serially or in parallel, to a digital storage medium.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.