generation fast processor: Topics by Science.gov

Sample records for generation fast processor

Parallel and pipeline computation of fast unitary transforms

NASA Technical Reports Server (NTRS)

Fino, B. J.; Algazi, V. R.

1975-01-01

The letter discusses the parallel and pipeline organization of fast-unitary-transform algorithms such as the fast Fourier transform, and points out the efficiency of a combined parallel-pipeline processor of a transform such as the Haar transform, in which (2 to the n-th power) -1 hardware 'butterflies' generate a transform of order 2 to the n-th power every computation cycle.
A note on parallel and pipeline computation of fast unitary transforms

NASA Technical Reports Server (NTRS)

Fino, B. J.; Algazi, V. R.

1974-01-01

The parallel and pipeline organization of fast unitary transform algorithms such as the Fast Fourier Transform are discussed. The efficiency is pointed out of a combined parallel-pipeline processor of a transform such as the Haar transform in which 2 to the n minus 1 power hardware butterflies generate a transform of order 2 to the n power every computation cycle.
Fast generation of computer-generated hologram by graphics processing unit

NASA Astrophysics Data System (ADS)

Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi

2009-02-01

A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Software design and implementation of ship heave motion monitoring system based on MBD method

NASA Astrophysics Data System (ADS)

Yu, Yan; Li, Yuhan; Zhang, Chunwei; Kang, Won-Hee; Ou, Jinping

2015-03-01

Marine transportation plays a significant role in the modern transport sector due to its advantage of low cost, large capacity. It is being attached enormous importance to all over the world. Nowadays the related areas of product development have become an existing hot spot. DSP signal processors feature micro volume, low cost, high precision, fast processing speed, which has been widely used in all kinds of monitoring systems. But traditional DSP code development process is time-consuming, inefficiency, costly and difficult. MathWorks company proposed Model-based Design (MBD) to overcome these defects. By calling the target board modules in simulink library to compile and generate the corresponding code for the target processor. And then automatically call DSP integrated development environment CCS for algorithm validation on the target processor. This paper uses the MDB to design the algorithm for the ship heave motion monitoring system. It proves the effectiveness of the MBD run successfully on the processor.
27ps DFT Molecular Dynamics Simulation of a-maltose: A Reduced Basis Set Study.

USDA-ARS?s Scientific Manuscript database

DFT molecular dynamics simulations are time intensive when carried out on carbohydrates such as alpha-maltose, requiring up to three or more weeks on a fast 16-processor computer to obtain just 5ps of constant energy dynamics. In a recent publication [1] forces for dynamics were generated from B3LY...
Design and Test of a 65nm CMOS Front-End with Zero Dead Time for Next Generation Pixel Detectors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaioni, L.; Braga, D.; Christian, D.

This work is concerned with the experimental characterization of a synchronous analog processor with zero dead time developed in a 65 nm CMOS technology, conceived for pixel detectors at the HL-LHC experiment upgrades. It includes a low noise, fast charge sensitive amplifier with detector leakage compensation circuit, and a compact, single ended comparator able to correctly process hits belonging to two consecutive bunch crossing periods. A 2-bit Flash ADC is exploited for digital conversion immediately after the preamplifier. A description of the circuits integrated in the front-end processor and the initial characterization results are provided
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Plural-wavelength flame detector that discriminates between direct and reflected radiation

NASA Technical Reports Server (NTRS)

Hall, Gregory H. (Inventor); Barnes, Heidi L. (Inventor); Medelius, Pedro J. (Inventor); Simpson, Howard J. (Inventor); Smith, Harvey S. (Inventor)

1997-01-01

A flame detector employs a plurality of wavelength selective radiation detectors and a digital signal processor programmed to analyze each of the detector signals, and determine whether radiation is received directly from a small flame source that warrants generation of an alarm. The processor's algorithm employs a normalized cross-correlation analysis of the detector signals to discriminate between radiation received directly from a flame and radiation received from a reflection of a flame to insure that reflections will not trigger an alarm. In addition, the algorithm employs a Fast Fourier Transform (FFT) frequency spectrum analysis of one of the detector signals to discriminate between flames of different sizes. In a specific application, the detector incorporates two infrared (IR) detectors and one ultraviolet (UV) detector for discriminating between a directly sensed small hydrogen flame, and reflections from a large hydrogen flame. The signals generated by each of the detectors are sampled and digitized for analysis by the digital signal processor, preferably 250 times a second. A sliding time window of approximately 30 seconds of detector data is created using FIFO memories.
Fast Fourier Transform Co-Processor (FFTC)- Towards Embedded GFLOPs

NASA Astrophysics Data System (ADS)

Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Wite, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland

2012-08-01

Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co- Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment.In frame of the ESA activity “Fast Fourier Transform DSP Co-processor (FFTC)” (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following:Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP.The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance.The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT- based processing tasks.A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses.The presentation will give and overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Fast Fourier Transform Co-processor (FFTC), towards embedded GFLOPs

NASA Astrophysics Data System (ADS)

Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Witte, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland; Kopp, Nicholas

2012-10-01

Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co-Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment. In frame of the ESA activity "Fast Fourier Transform DSP Co-processor (FFTC)" (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following: • Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP. • The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance. The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT-based processing tasks. A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses. The paper will give an overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
FPGA wavelet processor design using language for instruction-set architectures (LISA)

NASA Astrophysics Data System (ADS)

Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

2007-04-01

The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
Accelerating Pseudo-Random Number Generator for MCNP on GPU

NASA Astrophysics Data System (ADS)

Gong, Chunye; Liu, Jie; Chi, Lihua; Hu, Qingfeng; Deng, Li; Gong, Zhenghu

2010-09-01

Pseudo-random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N-Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP on NVIDIA's GTX200 Graphics Processor Units (GPU) using CUDA programming model. Results shows that 3.80 to 8.10 times speedup are achieved compared with 4 to 6 cores CPUs and more than 679.18 million double precision random numbers can be generated per second on GPU.
A programmable systolic array correlator as a trigger processor for electron pairs in rich (ring image Cherenkov) counters

NASA Astrophysics Data System (ADS)

Männer, R.

1989-12-01

This paper describes a systolic array processor for a ring image Cherenkov counter which is capable of identifying pairs of electron circles with a known radius and a certain minimum distance within 15 μs. The processor is a very flexible and fast device. It consists of 128 x 128 processing elements (PEs), where one PE is assigned to each pixel of the image. All PEs run synchronously at 40 MHz. The identification of electron circles is done by correlating the detector image with the proper circle circumference. Circle centers are found by peak detection in the correlation result. A second correlation with a circle disc allows circles of closed electron pairs to be rejected. The trigger decision is generated if a pseudo adder detects at least two remaining circles. The device is controlled by a freely programmable sequencer. A VLSI chip containing 8 x 8 PEs is being developed using a VENUS design system and will be produced in 2μ CMOS technology.
Airborne optical tracking control system design study

NASA Astrophysics Data System (ADS)

1992-09-01

The Kestrel LOS Tracking Program involves the development of a computer and algorithms for use in passive tracking of airborne targets from a high altitude balloon platform. The computer receivers track error signals from a video tracker connected to one of the imaging sensors. In addition, an on-board IRU (gyro), accelerometers, a magnetometer, and a two-axis inclinometer provide inputs which are used for initial acquisitions and course and fine tracking. Signals received by the control processor from the video tracker, IRU, accelerometers, magnetometer, and inclinometer are utilized by the control processor to generate drive signals for the payload azimuth drive, the Gimballed Mirror System (GMS), and the Fast Steering Mirror (FSM). The hardware which will be procured under the LOS tracking activity is the Controls Processor (CP), the IRU, and the FSM. The performance specifications for the GMS and the payload canister azimuth driver are established by the LOS tracking design team in an effort to achieve a tracking jitter of less than 3 micro-rad, 1 sigma for one axis.
An Efficient Functional Test Generation Method For Processors Using Genetic Algorithms

NASA Astrophysics Data System (ADS)

Hudec, Ján; Gramatová, Elena

2015-07-01

The paper presents a new functional test generation method for processors testing based on genetic algorithms and evolutionary strategies. The tests are generated over an instruction set architecture and a processor description. Such functional tests belong to the software-oriented testing. Quality of the tests is evaluated by code coverage of the processor description using simulation. The presented test generation method uses VHDL models of processors and the professional simulator ModelSim. The rules, parameters and fitness functions were defined for various genetic algorithms used in automatic test generation. Functionality and effectiveness were evaluated using the RISC type processor DP32.
Processing techniques for software based SAR processors

NASA Technical Reports Server (NTRS)

Leung, K.; Wu, C.

1983-01-01

Software SAR processing techniques defined to treat Shuttle Imaging Radar-B (SIR-B) data are reviewed. The algorithms are devised for the data processing procedure selection, SAR correlation function implementation, multiple array processors utilization, cornerturning, variable reference length azimuth processing, and range migration handling. The Interim Digital Processor (IDP) originally implemented for handling Seasat SAR data has been adapted for the SIR-B, and offers a resolution of 100 km using a processing procedure based on the Fast Fourier Transformation fast correlation approach. Peculiarities of the Seasat SAR data processing requirements are reviewed, along with modifications introduced for the SIR-B. An Advanced Digital SAR Processor (ADSP) is under development for use with the SIR-B in the 1986 time frame as an upgrade for the IDP, which will be in service in 1984-5.
FAST: framework for heterogeneous medical image computing and visualization.

PubMed

Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-11-01

Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
Electrochemical sensing using comparison of voltage-current time differential values during waveform generation and detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay

2018-01-02

A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less
Processor register error correction management

DOEpatents

Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.

2016-12-27

Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.

Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

PubMed Central

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

PubMed

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
A pipeline VLSI design of fast singular value decomposition processor for real-time EEG system based on on-line recursive independent component analysis.

PubMed

Huang, Kuan-Ju; Shih, Wei-Yeh; Chang, Jui Chung; Feng, Chih Wei; Fang, Wai-Chi

2013-01-01

This paper presents a pipeline VLSI design of fast singular value decomposition (SVD) processor for real-time electroencephalography (EEG) system based on on-line recursive independent component analysis (ORICA). Since SVD is used frequently in computations of the real-time EEG system, a low-latency and high-accuracy SVD processor is essential. During the EEG system process, the proposed SVD processor aims to solve the diagonal, inverse and inverse square root matrices of the target matrices in real time. Generally, SVD requires a huge amount of computation in hardware implementation. Therefore, this work proposes a novel design concept for data flow updating to assist the pipeline VLSI implementation. The SVD processor can greatly improve the feasibility of real-time EEG system applications such as brain computer interfaces (BCIs). The proposed architecture is implemented using TSMC 90 nm CMOS technology. The sample rate of EEG raw data adopts 128 Hz. The core size of the SVD processor is 580×580 um(2), and the speed of operation frequency is 20MHz. It consumes 0.774mW of power during the 8-channel EEG system per execution time.
Electrochemical sensing using voltage-current time differential

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay

2017-02-28

A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

NASA Astrophysics Data System (ADS)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Flight design system level C requirements. Solid rocket booster and external tank impact prediction processors. [space transportation system

NASA Technical Reports Server (NTRS)

Seale, R. H.

1979-01-01

The prediction of the SRB and ET impact areas requires six separate processors. The SRB impact prediction processor computes the impact areas and related trajectory data for each SRB element. Output from this processor is stored on a secure file accessible by the SRB impact plot processor which generates the required plots. Similarly the ET RTLS impact prediction processor and the ET RTLS impact plot processor generates the ET impact footprints for return-to-launch-site (RTLS) profiles. The ET nominal/AOA/ATO impact prediction processor and the ET nominal/AOA/ATO impact plot processor generate the ET impact footprints for non-RTLS profiles. The SRB and ET impact processors compute the size and shape of the impact footprints by tabular lookup in a stored footprint dispersion data base. The location of each footprint is determined by simulating a reference trajectory and computing the reference impact point location. To insure consistency among all flight design system (FDS) users, much input required by these processors will be obtained from the FDS master data base.
Implementation and simulations of the sphere solution in FAST

NASA Astrophysics Data System (ADS)

Murgolo, F. P.; Schirone, M. G.; Lattanzi, M.; Bernacca, P. L.

1989-06-01

The details of the implementation of the sphere solution software in the Fundamental Astronomy by Space Techniques (FAST) consortium, are described. The simulation results for realistic data sets, both with and without grid-step errors are given. Expected errors on the astrometric parameters of the primary stars and the precision of the reference great circle zero points, are provided as a function of mission duration. The design matrix, the diagrams of the context processor and the processors experimental results are given.
Real-time lens distortion correction: speed, accuracy and efficiency

NASA Astrophysics Data System (ADS)

Bax, Michael R.; Shahidi, Ramin

2014-11-01

Optical lens systems suffer from nonlinear geometrical distortion. Optical imaging applications such as image-enhanced endoscopy and image-based bronchoscope tracking require correction of this distortion for accurate localization, tracking, registration, and measurement of image features. Real-time capability is desirable for interactive systems and live video. The use of a texture-mapping graphics accelerator, which is standard hardware on current motherboard chipsets and add-in video graphics cards, to perform distortion correction is proposed. Mesh generation for image tessellation, an error analysis, and performance results are presented. It is shown that distortion correction using commodity graphics hardware is substantially faster than using the main processor and can be performed at video frame rates (faster than 30 frames per second), and that the polar-based method of mesh generation proposed here is more accurate than a conventional grid-based approach. Using graphics hardware to perform distortion correction is not only fast and accurate but also efficient as it frees the main processor for other tasks, which is an important issue in some real-time applications.
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Parallel processing in a host plus multiple array processor system for radar

NASA Technical Reports Server (NTRS)

Barkan, B. Z.

1983-01-01

Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Demonstration of Qubit Operations Below a Rigorous Fault Tolerance Threshold With Gate Set Tomography (Open Access, Publisher’s Version)

DTIC Science & Technology

2017-02-15

Maunz2 Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone...information processors have been demonstrated experimentally using superconducting circuits1–3, electrons in semiconductors4–6, trapped atoms and...qubit quantum information processor has been realized14, and single- qubit gates have demonstrated randomized benchmarking (RB) infidelities as low as 10
Experimental testing of the noise-canceling processor.

PubMed

Collins, Michael D; Baer, Ralph N; Simpson, Harry J

2011-09-01

Signal-processing techniques for localizing an acoustic source buried in noise are tested in a tank experiment. Noise is generated using a discrete source, a bubble generator, and a sprinkler. The experiment has essential elements of a realistic scenario in matched-field processing, including complex source and noise time series in a waveguide with water, sediment, and multipath propagation. The noise-canceling processor is found to outperform the Bartlett processor and provide the correct source range for signal-to-noise ratios below -10 dB. The multivalued Bartlett processor is found to outperform the Bartlett processor but not the noise-canceling processor. © 2011 Acoustical Society of America
Real time processor for array speckle interferometry

NASA Astrophysics Data System (ADS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-02-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Real time processor for array speckle interferometry

NASA Technical Reports Server (NTRS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-01-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Integral Fast Reactor fuel pin processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levinskas, D.

1993-01-01

This report discusses the pin processor which receives metal alloy pins cast from recycled Integral Fast Reactor (IFR) fuel and prepares them for assembly into new IFR fuel elements. Either full length as-cast or precut pins are fed to the machine from a magazine, cut if necessary, and measured for length, weight, diameter and deviation from straightness. Accepted pins are loaded into cladding jackets located in a magazine, while rejects and cutting scraps are separated into trays. The magazines, trays, and the individual modules that perform the different machine functions are assembled and removed using remote manipulators and master-slaves.
Integral Fast Reactor fuel pin processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levinskas, D.

1993-03-01

This report discusses the pin processor which receives metal alloy pins cast from recycled Integral Fast Reactor (IFR) fuel and prepares them for assembly into new IFR fuel elements. Either full length as-cast or precut pins are fed to the machine from a magazine, cut if necessary, and measured for length, weight, diameter and deviation from straightness. Accepted pins are loaded into cladding jackets located in a magazine, while rejects and cutting scraps are separated into trays. The magazines, trays, and the individual modules that perform the different machine functions are assembled and removed using remote manipulators and master-slaves.
Evaluation and application of a fast module in a PLC based interlock and control system

NASA Astrophysics Data System (ADS)

Zaera-Sanz, M.

2009-08-01

The LHC Beam Interlock system requires a controller performing a simple matrix function to collect the different beam dump requests. To satisfy the expected safety level of the Interlock, the system should be robust and reliable. The PLC is a promising candidate to fulfil both aspects but too slow to meet the expected response time which is of the order of μseconds. Siemens has introduced a ``so called'' fast module (FM352-5 Boolean Processor). It provides independent and extremely fast control of a process within a larger control system using an onboard processor, a Field Programmable Gate Array (FPGA), to execute code in parallel which results in extremely fast scan times. It is interesting to investigate its features and to evaluate it as a possible candidate for the beam interlock system. This paper publishes the results of this study. As well, this paper could be useful for other applications requiring fast processing using a PLC.
High Speed and High Functional Inverter Power Supplies for Plasma Generation and Control, and their Performance

NASA Astrophysics Data System (ADS)

Uesugi, Yoshihiko; Razzak, Mohammad A.; Kondo, Kenji; Kikuchi, Yusuke; Takamura, Shuichi; Imai, Takahiro; Toyoda, Mitsuhiro

The Rapid development of high power and high speed semiconductor switching devices has led to their various applications in related plasma fields. Especially, a high speed inverter power supply can be used as an RF power source instead of conventional linear amplifiers and a power supply to control the magnetic field in a fusion plasma device. In this paper, RF thermal plasma production and plasma heating experiments are described emphasis placed on using a static induction transistor inverter at a frequency range between 200 kHz and 2.5 MHz as an RF power supply. Efficient thermal plasma production is achieved experimentally by using a flexible and easily operated high power semiconductor inverter power supply. Insulated gate bipolar transistor (IGBT) inverter power supplies driven by a high speed digital signal processor are applied as tokamak joule coil and vertical coil power supplies to control plasma current waveform and plasma equilibrium. Output characteristics, such as the arbitrary bipolar waveform generation of a pulse width modulation (PWM) inverter using digital signal processor (DSP) can be successfully applied to tokamak power supplies for flexible plasma current operation and fast position control of a small tokamak.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, D.B.

1996-12-31

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

Fast Neural Solution Of A Nonlinear Wave Equation

NASA Technical Reports Server (NTRS)

Barhen, Jacob; Toomarian, Nikzad

1996-01-01

Neural algorithm for simulation of class of nonlinear wave phenomena devised. Numerically solves special one-dimensional case of Korteweg-deVries equation. Intended to be executed rapidly by neural network implemented as charge-coupled-device/charge-injection device, very-large-scale integrated-circuit analog data processor of type described in "CCD/CID Processors Would Offer Greater Precision" (NPO-18972).
Ring-array processor distribution topology for optical interconnects

NASA Technical Reports Server (NTRS)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Automated system for analyzing the activity of individual neurons

NASA Technical Reports Server (NTRS)

Bankman, Isaac N.; Johnson, Kenneth O.; Menkes, Alex M.; Diamond, Steve D.; Oshaughnessy, David M.

1993-01-01

This paper presents a signal processing system that: (1) provides an efficient and reliable instrument for investigating the activity of neuronal assemblies in the brain; and (2) demonstrates the feasibility of generating the command signals of prostheses using the activity of relevant neurons in disabled subjects. The system operates online, in a fully automated manner and can recognize the transient waveforms of several neurons in extracellular neurophysiological recordings. Optimal algorithms for detection, classification, and resolution of overlapping waveforms are developed and evaluated. Full automation is made possible by an algorithm that can set appropriate decision thresholds and an algorithm that can generate templates on-line. The system is implemented with a fast IBM PC compatible processor board that allows on-line operation.
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

NASA Astrophysics Data System (ADS)

Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka

2017-02-01

Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely 2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.
Application of Fast Multipole Methods to the NASA Fast Scattering Code

NASA Technical Reports Server (NTRS)

Dunn, Mark H.; Tinetti, Ana F.

2008-01-01

The NASA Fast Scattering Code (FSC) is a versatile noise prediction program designed to conduct aeroacoustic noise reduction studies. The equivalent source method is used to solve an exterior Helmholtz boundary value problem with an impedance type boundary condition. The solution process in FSC v2.0 requires direct manipulation of a large, dense system of linear equations, limiting the applicability of the code to small scales and/or moderate excitation frequencies. Recent advances in the use of Fast Multipole Methods (FMM) for solving scattering problems, coupled with sparse linear algebra techniques, suggest that a substantial reduction in computer resource utilization over conventional solution approaches can be obtained. Implementation of the single level FMM (SLFMM) and a variant of the Conjugate Gradient Method (CGM) into the FSC is discussed in this paper. The culmination of this effort, FSC v3.0, was used to generate solutions for three configurations of interest. Benchmarking against previously obtained simulations indicate that a twenty-fold reduction in computational memory and up to a four-fold reduction in computer time have been achieved on a single processor.
New Modular Ultrasonic Signal Processing Building Blocks for Real-Time Data Acquisition and Post Processing

NASA Astrophysics Data System (ADS)

Weber, Walter H.; Mair, H. Douglas; Jansen, Dion

2003-03-01

A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

DOE PAGES

O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...

1995-01-01

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
78 FR 41116 - Agency Information Collection Activities: Proposed Collection; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-09

... Agreement State regulations. All generators, collectors, and processors of low-level waste intended for... which facilitates tracking the identity of the waste generator. That tracking becomes more complicated... waste shipped from a waste processor may contain waste from several different generators. The...
Method for fast start of a fuel processor

DOEpatents

Ahluwalia, Rajesh K [Burr Ridge, IL; Ahmed, Shabbir [Naperville, IL; Lee, Sheldon H. D. [Willowbrook, IL

2008-01-29

An improved fuel processor for fuel cells is provided whereby the startup time of the processor is less than sixty seconds and can be as low as 30 seconds, if not less. A rapid startup time is achieved by either igniting or allowing a small mixture of air and fuel to react over and warm up the catalyst of an autothermal reformer (ATR). The ATR then produces combustible gases to be subsequently oxidized on and simultaneously warm up water-gas shift zone catalysts. After normal operating temperature has been achieved, the proportion of air included with the fuel is greatly diminished.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Hardware multiplier processor

DOEpatents

Pierce, Paul E.

1986-01-01

A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Hardware multiplier processor

DOEpatents

Pierce, P.E.

A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Fast physical-random number generation using laser diode's frequency noise: influence of frequency discriminator

NASA Astrophysics Data System (ADS)

Matsumoto, Kouhei; Kasuya, Yuki; Yumoto, Mitsuki; Arai, Hideaki; Sato, Takashi; Sakamoto, Shuichi; Ohkawa, Masashi; Ohdaira, Yasuo

2018-02-01

Not so long ago, pseudo random numbers generated by numerical formulae were considered to be adequate for encrypting important data-files, because of the time needed to decode them. With today's ultra high-speed processors, however, this is no longer true. So, in order to thwart ever-more advanced attempts to breach our system's protections, cryptologists have devised a method that is considered to be virtually impossible to decode, and uses what is a limitless number of physical random numbers. This research describes a method, whereby laser diode's frequency noise generate a large quantities of physical random numbers. Using two types of photo detectors (APD and PIN-PD), we tested the abilities of two types of lasers (FP-LD and VCSEL) to generate random numbers. In all instances, an etalon served as frequency discriminator, the examination pass rates were determined using NIST FIPS140-2 test at each bit, and the Random Number Generation (RNG) speed was noted.
Development of hardware accelerator for molecular dynamics simulations: a computation board that calculates nonbonded interactions in cooperation with fast multipole method.

PubMed

Amisaki, Takashi; Toyoda, Shinjiro; Miyagawa, Hiroh; Kitamura, Kunihiro

2003-04-15

Evaluation of long-range Coulombic interactions still represents a bottleneck in the molecular dynamics (MD) simulations of biological macromolecules. Despite the advent of sophisticated fast algorithms, such as the fast multipole method (FMM), accurate simulations still demand a great amount of computation time due to the accuracy/speed trade-off inherently involved in these algorithms. Unless higher order multipole expansions, which are extremely expensive to evaluate, are employed, a large amount of the execution time is still spent in directly calculating particle-particle interactions within the nearby region of each particle. To reduce this execution time for pair interactions, we developed a computation unit (board), called MD-Engine II, that calculates nonbonded pairwise interactions using a specially designed hardware. Four custom arithmetic-processors and a processor for memory manipulation ("particle processor") are mounted on the computation board. The arithmetic processors are responsible for calculation of the pair interactions. The particle processor plays a central role in realizing efficient cooperation with the FMM. The results of a series of 50-ps MD simulations of a protein-water system (50,764 atoms) indicated that a more stringent setting of accuracy in FMM computation, compared with those previously reported, was required for accurate simulations over long time periods. Such a level of accuracy was efficiently achieved using the cooperative calculations of the FMM and MD-Engine II. On an Alpha 21264 PC, the FMM computation at a moderate but tolerable level of accuracy was accelerated by a factor of 16.0 using three boards. At a high level of accuracy, the cooperative calculation achieved a 22.7-fold acceleration over the corresponding conventional FMM calculation. In the cooperative calculations of the FMM and MD-Engine II, it was possible to achieve more accurate computation at a comparable execution time by incorporating larger nearby regions. Copyright 2003 Wiley Periodicals, Inc. J Comput Chem 24: 582-592, 2003
78 FR 67402 - Agency Information Collection Activities: Submission for the Office of Management and Budget (OMB...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-11-12

... Regulations (10 CFR) or equivalent Agreement State regulations. All generators, collectors, and processors of... which facilitates tracking the identity of the waste generator. That tracking becomes more complicated... waste shipped from a waste processor may contain waste from several different generators. The...
Yes! An object-oriented compiler compiler (YOOCC)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avotins, J.; Mingins, C.; Schmidt, H.

1995-12-31

Grammar-based processor generation is one of the most widely studied areas in language processor construction. However, there have been very few approaches to date that reconcile object-oriented principles, processor generation, and an object-oriented language. Pertinent here also. is that currently to develop a processor using the Eiffel Parse libraries requires far too much time to be expended on tasks that can be automated. For these reasons, we have developed YOOCC (Yes! an Object-Oriented Compiler Compiler), which produces a processor framework from a grammar using an enhanced version of the Eiffel Parse libraries, incorporating the ideas hypothesized by Meyer, and Grapemore » and Walden, as well as many others. Various essential changes have been made to the Eiffel Parse libraries. Examples are presented to illustrate the development of a processor using YOOCC, and it is concluded that the Eiffel Parse libraries are now not only an intelligent, but also a productive option for processor construction.« less
1981 Image II Conference Proceedings.

DTIC Science & Technology

1981-11-01

rapid motion of terrain detail across the display requires fast display processors. Other difficulties are perceptual: the visual displays must convey...has been a continuing effort by Vought in the last decade. Early systems were restricted by the unavailability of video bulk storage with fast random...each photograph. The calculations aided in the proper sequencing of the scanned scenes on the tape recorder and eventually facilitated fast random
Rectangular Array Of Digital Processors For Planning Paths

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

1993-01-01

Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Development of Improved Modeling and Analysis Techniques for Dynamics of Shell Structures

DTIC Science & Technology

1991-07-24

Engineering Sciences and Center for Space Structures and Control University of Colorado,Campus Box 429 Boulder, Colorado 80309 Accesion :or -.... ... i...system architecture ; third, to implement a decomposi- tion/mapping procedure that matches as far as possible the layout of the processors to the...element computations. In particular. we address issues that are related to the processor memory size. to the SIMD architecture and to the fast

Parallel processing approach to transform-based image coding

NASA Astrophysics Data System (ADS)

Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

1991-06-01

This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
Fast particles identification in programmable form at level-0 trigger by means of the 3D-Flow system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crosetto, Dario B.

1998-10-30

The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on either an FPGA or an ASIC implementation, it can address, in a fully programmable manner, applications where commercially available processors would fail because of throughput requirements. Possible applications include filtering-algorithms (pattern recognition) from the input of multiple sensors, as well as moving any input validated by these filtering-algorithms to a single output channel. Both operations can easily be implemented on a 3D-Flow system to achieve a real-time processing system with a very short lag time. This system can be built either with off-the-shelfmore » FPGAs or, for higher data rates, with CMOS chips containing 4 to 16 processors each. The basic building block of the system, a 3D-Flow processor, has been successfully designed in VHDL code written in ''Generic HDL'' (mostly made of reusable blocks that are synthesizable in different technologies, or FPGAs), to produce a netlist for a four-processor ASIC featuring 0.35 micron CBA (Ceil Base Array) technology at 3.3 Volts, 884 mW power dissipation at 60 MHz and 63.75 mm sq. die size. The same VHDL code has been targeted to three FPGA manufacturers (Altera EPF10K250A, ORCA-Lucent Technologies 0R3T165 and Xilinx XCV1000). A complete set of software tools, the 3D-Flow System Manager, equally applicable to ASIC or FPGA implementations, has been produced to provide full system simulation, application development, real-time monitoring, and run-time fault recovery. Today's technology can accommodate 16 processors per chip in a medium size die, at a cost per processor of less than $5 based on the current silicon die/size technology cost.« less
The MIDAS processor. [Multivariate Interactive Digital Analysis System for multispectral scanner data

NASA Technical Reports Server (NTRS)

Kriegler, F. J.; Gordon, M. F.; Mclaughlin, R. H.; Marshall, R. E.

1975-01-01

The MIDAS (Multivariate Interactive Digital Analysis System) processor is a high-speed processor designed to process multispectral scanner data (from Landsat, EOS, aircraft, etc.) quickly and cost-effectively to meet the requirements of users of remote sensor data, especially from very large areas. MIDAS consists of a fast multipipeline preprocessor and classifier, an interactive color display and color printer, and a medium scale computer system for analysis and control. The system is designed to process data having as many as 16 spectral bands per picture element at rates of 200,000 picture elements per second into as many as 17 classes using a maximum likelihood decision rule.
Fault detection and bypass in a sequence information signal processor

NASA Technical Reports Server (NTRS)

Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

1992-01-01

The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
Emergency product generation for disaster management using RISAT and DMSAR quick look SAR processors

NASA Astrophysics Data System (ADS)

Desai, Nilesh; Sharma, Ritesh; Kumar, Saravana; Misra, Tapan; Gujraty, Virendra; Rana, SurinderSingh

2006-12-01

Since last few years, ISRO has embarked upon the development of two complex Synthetic Aperture Radar (SAR) missions, viz. Spaceborne Radar Imaging Satellite (RISAT) and Airborne SAR for Disaster Mangement (DMSAR), as a capacity building measure under country's Disaster Management Support (DMS) Program, for estimating the extent of damage over large areas (~75 Km) and also assess the effectiveness of the relief measures undertaken during natural disasters such as cyclones, epidemics, earthquakes, floods and landslides, forest fires, crop diseases etc. Synthetic Aperture Radar (SAR) has an unique role to play in mapping and monitoring of large areas affected by natural disasters especially floods, owing to its unique capability to see through clouds as well as all-weather imaging capability. The generation of SAR images with quick turn around time is very essential to meet the above DMS objectives. Thus the development of SAR Processors, for these two SAR systems poses considerable challenges and design efforts. Considering the growing user demand and inevitable necessity for a full-fledged high throughput processor, to process SAR data and generate image in real or near-real time, the design and development of a generic SAR Processor has been taken up and evolved, which will meet the SAR processing requirements for both Airborne and Spaceborne SAR systems. This hardware SAR processor is being built, to the extent possible, using only Commercial-Off-The-Shelf (COTS) DSP and other hardware plug-in modules on a Compact PCI (cPCI) platform. Thus, the major thrust has been on working out Multi-processor Digital Signal Processor (DSP) architecture and algorithm development and optimization rather than hardware design and fabrication. For DMSAR, this generic SAR Processor operates as a Quick Look SAR Processor (QLP) on-board the aircraft to produce real time full swath DMSAR images and as a ground based Near-Real Time high precision full swath Processor (NRTP). It will generate full-swath (6 to 75 Kms) DMSAR images in 1m / 3m / 5m / 10m / 30m resolution SAR operating modes. For RISAT mission, this generic Quick Look SAR Processor will be mainly used for browse product generation at NRSA-Shadnagar (SAN) ground receive station. RISAT QLP/NRTP is also proposed to provide an alternative emergency SAR product generation chain. For this, the S/C aux data appended in Onboard SAR Frame Format (x, y, z, x', y', z', roll, pitch, yaw) and predicted orbit from previous days Orbit Determination data will be used. The QLP / NRTP will produce ground range images in real / near real time. For emergency data product generation, additional Off-line tasks like geo-tagging, masking, QC etc needs to be performed on the processed image. The QLP / NRTP would generate geo-tagged images from the annotation data available from the SAR P/L data itself. Since the orbit & attitude information are taken as it is, the location accuracy will be poorer compared to the product generated using ADIF, where smoothened attitude and orbit are made available. Additional tasks like masking, output formatting and Quality checking of the data product will be carried out at Balanagar, NRSA after the image annotated data from QLP / NRTP is sent to Balanagar. The necessary interfaces to the QLP/NRTP for Emergency product generation are also being worked out. As is widely acknowledged, QLP/NRTP for RISAT and DMSAR is an ambitious effort and the technology of future. It is expected that by the middle of next decade, the next generation SAR missions worldwide will have onboard SAR Processors of varying capabilities and generate SAR Data products and Information products onboard instead of SAR raw data. Thus, it is also envisaged that these activities related to QLP/NRTP implementation for RISAT ground segment and DMSAR will be a significant step which will directly feed into the development of onboard real time processing systems for ISRO's future space borne SAR missions. This paper describes the design requirements, configuration details and salient features, apart from highlighting the utility of these Quick Look SAR processors for RISAT and DMSAR, for generation of emergency products for Disaster management.
A fast adaptive convex hull algorithm on two-dimensional processor arrays with a reconfigurable BUS system

NASA Technical Reports Server (NTRS)

Olariu, S.; Schwing, J.; Zhang, J.

1991-01-01

A bus system that can change dynamically to suit computational needs is referred to as reconfigurable. We present a fast adaptive convex hull algorithm on a two-dimensional processor array with a reconfigurable bus system (2-D PARBS, for short). Specifically, we show that computing the convex hull of a planar set of n points taken O(log n/log m) time on a 2-D PARBS of size mn x n with 3 less than or equal to m less than or equal to n. Our result implies that the convex hull of n points in the plane can be computed in O(1) time in a 2-D PARBS of size n(exp 1.5) x n.
Scan line graphics generation on the massively parallel processor

NASA Technical Reports Server (NTRS)

Dorband, John E.

1988-01-01

Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Input data requirements for special processors in the computation system containing the VENTURE neutronics code. [LMFBR

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

1979-07-01

User input data requirements are presented for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated.
Feasibility of an ultra-low power digital signal processor platform as a basis for a fully implantable brain-computer interface system.

PubMed

Wang, Po T; Gandasetiawan, Keulanna; McCrimmon, Colin M; Karimi-Bidhendi, Alireza; Liu, Charles Y; Heydari, Payam; Nenadic, Zoran; Do, An H

2016-08-01

A fully implantable brain-computer interface (BCI) can be a practical tool to restore independence to those affected by spinal cord injury. We envision that such a BCI system will invasively acquire brain signals (e.g. electrocorticogram) and translate them into control commands for external prostheses. The feasibility of such a system was tested by implementing its benchtop analogue, centered around a commercial, ultra-low power (ULP) digital signal processor (DSP, TMS320C5517, Texas Instruments). A suite of signal processing and BCI algorithms, including (de)multiplexing, Fast Fourier Transform, power spectral density, principal component analysis, linear discriminant analysis, Bayes rule, and finite state machine was implemented and tested in the DSP. The system's signal acquisition fidelity was tested and characterized by acquiring harmonic signals from a function generator. In addition, the BCI decoding performance was tested, first with signals from a function generator, and subsequently using human electroencephalogram (EEG) during eyes opening and closing task. On average, the system spent 322 ms to process and analyze 2 s of data. Crosstalk (<;-65 dB) and harmonic distortion (~1%) were minimal. Timing jitter averaged 49 μs per 1000 ms. The online BCI decoding accuracies were 100% for both function generator and EEG data. These results show that a complex BCI algorithm can be executed on an ULP DSP without compromising performance. This suggests that the proposed hardware platform may be used as a basis for future, fully implantable BCI systems.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G; Salapura, Valentina

2014-12-02

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A distributed control system for the lower-hybrid current drive system on the Tokamak de Varennes

NASA Astrophysics Data System (ADS)

Bagdoo, J.; Guay, J. M.; Chaudron, G.-A.; Decoste, R.; Demers, Y.; Hubbard, A.

1990-08-01

An rf current drive system with an output power of 1 MW at 3.7 GHz is under development for the Tokamak de Varennes. The control system is based on an Ethernet local-area network of programmable logic controllers as front end, personal computers as consoles, and CAMAC-based DSP processors. The DSP processors ensure the PID control of the phase and rf power of each klystron, and the fast protection of high-power rf hardware, all within a 40 μs loop. Slower control and protection, event sequencing and the run-time database are provided by the programmable logic controllers, which communicate, via the LAN, with the consoles. The latter run a commercial process-control console software. The LAN protocol respects the first four layers of the ISO/OSI 802.3 standard. Synchronization with the tokamak control system is provided by commercially available CAMAC timing modules which trigger shot-related events and reference waveform generators. A detailed description of each subsystem and a performance evaluation of the system will be presented.
Optoelectronic analogs of self-programming neural nets - Architecture and methodologies for implementing fast stochastic learning by simulated annealing

NASA Technical Reports Server (NTRS)

Farhat, Nabil H.

1987-01-01

Self-organization and learning is a distinctive feature of neural nets and processors that sets them apart from conventional approaches to signal processing. It leads to self-programmability which alleviates the problem of programming complexity in artificial neural nets. In this paper architectures for partitioning an optoelectronic analog of a neural net into distinct layers with prescribed interconnectivity pattern to enable stochastic learning by simulated annealing in the context of a Boltzmann machine are presented. Stochastic learning is of interest because of its relevance to the role of noise in biological neural nets. Practical considerations and methodologies for appreciably accelerating stochastic learning in such a multilayered net are described. These include the use of parallel optical computing of the global energy of the net, the use of fast nonvolatile programmable spatial light modulators to realize fast plasticity, optical generation of random number arrays, and an adaptive noisy thresholding scheme that also makes stochastic learning more biologically plausible. The findings reported predict optoelectronic chips that can be used in the realization of optical learning machines.
A microcomputer based frequency-domain processor for laser Doppler anemometry

NASA Technical Reports Server (NTRS)

Horne, W. Clifton; Adair, Desmond

1988-01-01

A prototype multi-channel laser Doppler anemometry (LDA) processor was assembled using a wideband transient recorder and a microcomputer with an array processor for fast Fourier transform (FFT) computations. The prototype instrument was used to acquire, process, and record signals from a three-component wind tunnel LDA system subject to various conditions of noise and flow turbulence. The recorded data was used to evaluate the effectiveness of burst acceptance criteria, processing algorithms, and selection of processing parameters such as record length. The recorded signals were also used to obtain comparative estimates of signal-to-noise ratio between time-domain and frequency-domain signal detection schemes. These comparisons show that the FFT processing scheme allows accurate processing of signals for which the signal-to-noise ratio is 10 to 15 dB less than is practical using counter processors.
NCC Simulation Model: Simulating the operations of the network control center, phase 2

NASA Technical Reports Server (NTRS)

Benjamin, Norman M.; Paul, Arthur S.; Gill, Tepper L.

1992-01-01

The simulation of the network control center (NCC) is in the second phase of development. This phase seeks to further develop the work performed in phase one. Phase one concentrated on the computer systems and interconnecting network. The focus of phase two will be the implementation of the network message dialogues and the resources controlled by the NCC. These resources are requested, initiated, monitored and analyzed via network messages. In the NCC network messages are presented in the form of packets that are routed across the network. These packets are generated, encoded, decoded and processed by the network host processors that generate and service the message traffic on the network that connects these hosts. As a result, the message traffic is used to characterize the work done by the NCC and the connected network. Phase one of the model development represented the NCC as a network of bi-directional single server queues and message generating sources. The generators represented the external segment processors. The served based queues represented the host processors. The NCC model consists of the internal and external processors which generate message traffic on the network that links these hosts. To fully realize the objective of phase two it is necessary to identify and model the processes in each internal processor. These processes live in the operating system of the internal host computers and handle tasks such as high speed message exchanging, ISN and NFE interface, event monitoring, network monitoring, and message logging. Inter process communication is achieved through the operating system facilities. The overall performance of the host is determined by its ability to service messages generated by both internal and external processors.
Buffered coscheduling for parallel programming and enhanced fault tolerance

DOEpatents

Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM

2006-01-31

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
Computer program documentation for the pasture/range condition assessment processor

NASA Technical Reports Server (NTRS)

Mcintyre, K. S.; Miller, T. G. (Principal Investigator)

1982-01-01

The processor which drives for the RANGE software allows the user to analyze LANDSAT data containing pasture and rangeland. Analysis includes mapping, generating statistics, calculating vegetative indexes, and plotting vegetative indexes. Routines for using the processor are given. A flow diagram is included.
Signal generation and mixing electronics for frequency-domain lifetime and spectral fluorometry

NASA Technical Reports Server (NTRS)

Cruce, Tommy Clay (Inventor); Hallidy, William H. (Inventor); Chin, Robert C. (Inventor)

2007-01-01

The present invention additionally comprises a method and apparatus for generating and mixing signals for frequency-domain lifetime and spectral fluorometry. The present invention comprises a plurality of signal generators that generate a plurality of signals where the signal generators modulate the amplitude and/or the frequency of the signals. The present invention uses one of these signals to drive an excitation signal that the present invention then directs and transmits at a target mixture, which absorbs the energy from the excitation signal. The property of fluorescence causes the target mixture to emit an emitted signal that the present invention detects with a signal detector. The present invention uses a plurality of mixers to produce a processor reference signal and a data signal. The present invention then uses a processor to compare the processor reference signal with the data signal by analyzing the differences in the phase and the differences in the amplitude between the two signals. The processor then extracts the fluorescence lifetime and fluorescence spectrum of the emitted signal from the phase and amplitude information using a chemometric analysis.
Next Generation Space Telescope Integrated Science Module Data System

NASA Technical Reports Server (NTRS)

Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.

1999-01-01

The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
Thermal Hotspots in CPU Die and It's Future Architecture

NASA Astrophysics Data System (ADS)

Wang, Jian; Hu, Fu-Yuan

Owing to the increasing core frequency and chip integration and the limited die dimension, the power densities in CPU chip have been increasing fastly. The high temperature on chip resulted by power densities threats the processor's performance and chip's reliability. This paper analyzed the thermal hotspots in die and their properties. A new architecture of function units in die - - hot units distributed architecture is suggested to cope with the problems of high power densities for future processor chip.
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.

PubMed

Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D

2002-08-07

Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.

A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi

1997-01-01

A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
Advanced satellite communication system

NASA Technical Reports Server (NTRS)

Staples, Edward J.; Lie, Sen

1992-01-01

The objective of this research program was to develop an innovative advanced satellite receiver/demodulator utilizing surface acoustic wave (SAW) chirp transform processor and coherent BPSK demodulation. The algorithm of this SAW chirp Fourier transformer is of the Convolve - Multiply - Convolve (CMC) type, utilizing off-the-shelf reflective array compressor (RAC) chirp filters. This satellite receiver, if fully developed, was intended to be used as an on-board multichannel communications repeater. The Advanced Communications Receiver consists of four units: (1) CMC processor, (2) single sideband modulator, (3) demodulator, and (4) chirp waveform generator and individual channel processors. The input signal is composed of multiple user transmission frequencies operating independently from remotely located ground terminals. This signal is Fourier transformed by the CMC Processor into a unique time slot for each user frequency. The CMC processor is driven by a waveform generator through a single sideband (SSB) modulator. The output of the coherent demodulator is composed of positive and negative pulses, which are the envelopes of the chirp transform processor output. These pulses correspond to the data symbols. Following the demodulator, a logic circuit reconstructs the pulses into data, which are subsequently differentially decoded to form the transmitted data. The coherent demodulation and detection of BPSK signals derived from a CMC chirp transform processor were experimentally demonstrated and bit error rate (BER) testing was performed. To assess the feasibility of such advanced receiver, the results were compared with the theoretical analysis and plotted for an average BER as a function of signal-to-noise ratio. Another goal of this SBIR program was the development of a commercial product. The commercial product developed was an arbitrary waveform generator. The successful sales have begun with the delivery of the first arbitrary waveform generator.
REGIONAL-SCALE (1000 KM) MODEL OF PHOTOCHEMICAL AIR POLLUTION. PART 2. INPUT PROCESSOR NETWORK DESIGN

EPA Science Inventory

Detailed specifications are given for a network of data processors and submodels that can generate the parameter fields required by the regional oxidant model formulated in Part 1 of this report. Operations performed by the processor network include simulation of the motion and d...
Unaligned instruction relocation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertolli, Carlo; O'Brien, John K.; Sallenave, Olivier H.

In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unalignedmore » ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.« less
Unaligned instruction relocation

DOEpatents

Bertolli, Carlo; O'Brien, John K.; Sallenave, Olivier H.; Sura, Zehra N.

2018-01-23

In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unaligned ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.
CPU architecture for a fast and energy-saving calculation of convolution neural networks

NASA Astrophysics Data System (ADS)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Generating unstructured nuclear reactor core meshes in parallel

DOE PAGES

Jain, Rajeev; Tautges, Timothy J.

2014-10-24

Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Global synchronization of parallel processors using clock pulse width modulation

DOEpatents

Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

2013-04-02

A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.
Method for operating a combustor in a fuel cell system

DOEpatents

Clingerman, Bruce J.; Mowery, Kenneth D.

2002-01-01

In one aspect, the invention provides a method of operating a combustor to heat a fuel processor to a desired temperature in a fuel cell system, wherein the fuel processor generates hydrogen (H.sub.2) from a hydrocarbon for reaction within a fuel cell to generate electricity. More particularly, the invention provides a method and select system design features which cooperate to provide a start up mode of operation and a smooth transition from start-up of the combustor and fuel processor to a running mode.
System on chip module configured for event-driven architecture

DOEpatents

Robbins, Kevin; Brady, Charles E.; Ashlock, Tad A.

2017-10-17

A system on chip (SoC) module is described herein, wherein the SoC modules comprise a processor subsystem and a hardware logic subsystem. The processor subsystem and hardware logic subsystem are in communication with one another, and transmit event messages between one another. The processor subsystem executes software actors, while the hardware logic subsystem includes hardware actors, the software actors and hardware actors conform to an event-driven architecture, such that the software actors receive and generate event messages and the hardware actors receive and generate event messages.
Evaluating Food Safety Knowledge and Practices of Food Processors and Sellers Working in Food Facilities in Hanoi, Vietnam.

PubMed

Tran, Bach Xuan; DO, Hoa Thi; Nguyen, Luong Thanh; Boggiano, Victoria; LE, Huong Thi; LE, Xuan Thanh Thi; Trinh, Ngoc Bao; DO, Khanh Nam; Nguyen, Cuong Tat; Nguyen, Thanh Trung; Dang, Anh Kim; Mai, Hue Thi; Nguyen, Long Hoang; Than, Selena; Latkin, Carl A

2018-04-01

Consumption of fast food and street food is increasingly common among Vietnamese, particularly in large cities. The high daily demand for these convenient food services, together with a poor management system, has raised concerns about food hygiene and safety (FHS). This study aimed to examine the FHS knowledge and practices of food processors and sellers in food facilities in Hanoi, Vietnam, and to identify their associated factors. A cross-sectional study was conducted with 1,760 food processors and sellers in restaurants, fast food stores, food stalls, and street vendors in Hanoi in 2015. We assessed each participant's FHS knowledge using a self-report questionnaire and their FHS practices using a checklist. Tobit regression was used to determine potential factors associated with FHS knowledge and practices, including demographics, training experience, and frequency of health examination. Overall, we observed a lack of FHS knowledge among respondents across three domains, including standard requirements for food facilities (18%), food processing procedures (29%), and food poisoning prevention (11%). Only 25.9 and 38.1% of participants used caps and masks, respectively, and 12.8% of food processors reported direct hand contact with food. After adjusting for socioeconomic characteristics, these factors significantly predicted increased FHS knowledge and practice scores: (i) working at restaurants and food stalls, (ii) having FHS training, (iii) having had a physical examination, and (iv) having taken a stool test within the last year. These findings highlight the need of continuous training to improve FHS knowledge and practices among food processors and food sellers. Moreover, regular monitoring of food facilities, combined with medical examination of their staff, should be performed to ensure food safety.
Toshiba TDF-500 High Resolution Viewing And Analysis System

NASA Astrophysics Data System (ADS)

Roberts, Barry; Kakegawa, M.; Nishikawa, M.; Oikawa, D.

1988-06-01

A high resolution, operator interactive, medical viewing and analysis system has been developed by Toshiba and Bio-Imaging Research. This system provides many advanced features including high resolution displays, a very large image memory and advanced image processing capability. In particular, the system provides CRT frame buffers capable of update in one frame period, an array processor capable of image processing at operator interactive speeds, and a memory system capable of updating multiple frame buffers at frame rates whilst supporting multiple array processors. The display system provides 1024 x 1536 display resolution at 40Hz frame and 80Hz field rates. In particular, the ability to provide whole or partial update of the screen at the scanning rate is a key feature. This allows multiple viewports or windows in the display buffer with both fixed and cine capability. To support image processing features such as windowing, pan, zoom, minification, filtering, ROI analysis, multiplanar and 3D reconstruction, a high performance CPU is integrated into the system. This CPU is an array processor capable of up to 400 million instructions per second. To support the multiple viewer and array processors' instantaneous high memory bandwidth requirement, an ultra fast memory system is used. This memory system has a bandwidth capability of 400MB/sec and a total capacity of 256MB. This bandwidth is more than adequate to support several high resolution CRT's and also the fast processing unit. This fully integrated approach allows effective real time image processing. The integrated design of viewing system, memory system and array processor are key to the imaging system. It is the intention to describe the architecture of the image system in this paper.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, Michael S.; Strip, David R.

1996-01-01

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

NASA Astrophysics Data System (ADS)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Thriving on Chaos: The Development of a Surgical Information System

PubMed Central

Olund, Steven R.

1988-01-01

Hospitals present unique challenges to the computer industry, generating a greater quantity and variety of data than nearly any other enterprise. This is complicated by the fact that a hospital is not one homogenous organization, but a bundle of semi-independent groups with unique data requirements. Therefore hospital information systems must be fast, flexible, reliable, easy to use and maintain, and cost-effective. The Surgical Information System at Rush Presbyterian-St. Luke's Medical Center, Chicago is such system. It uses a Sequent Balance 21000 multi-processor superminicomputer, running industry standard tools such as the Unix operating system, a 4th generation programming language (4GL), and Structured Query Language (SQL) relational database management software. This treatise illustrates a comprehensive yet generic approach which can be applied to almost any clinical situation where access to patient data is required by a variety of medical professionals.
A novel piezo vibration platform for probe dynamic performance calibration

NASA Astrophysics Data System (ADS)

Liang, Rong; Jusko, Otto; Lüdicke, Frank; Neugebauer, Michael

2001-09-01

A novel piezo vibration platform of compact size (120×120×120 mm3) for probe dynamic performance calibration has been developed. A piezo tube is employed to generate movement which is measured in real time by a miniature fibre interferometer and close-loop controlled by a fast digital signal processor, thus the calibration can be made traceable to the national length standard. 20 kHz control-loop frequency with 1.71 nm uncertainty has been achieved. The maximum calibration range is 20 µm with 0.3 nm resolution. The piezo vibration platform can generate up to 300 Hz sinusoidal signal and various other waveforms, such as square, triangle and saw tooth. It can also work in sweep mode to shift the frequency up to 100 Hz continuously, which is a very useful function when the amplitude-frequency response of the probe is to be investigated.
Fast neural net simulation with a DSP processor array.

PubMed

Muller, U A; Gunzinger, A; Guggenbuhl, W

1995-01-01

This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Parallel processor-based raster graphics system architecture

DOEpatents

Littlefield, Richard J.

1990-01-01

An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.
Multimode power processor

DOEpatents

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.

Multimode power processor

DOEpatents

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Reconfigurable signal processor designs for advanced digital array radar systems

NASA Astrophysics Data System (ADS)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
Control apparatus and method for efficiently heating a fuel processor in a fuel cell system

DOEpatents

Doan, Tien M.; Clingerman, Bruce J.

2003-08-05

A control apparatus and method for efficiently controlling the amount of heat generated by a fuel cell processor in a fuel cell system by determining a temperature error between actual and desired fuel processor temperatures. The temperature error is converted to a combustor fuel injector command signal or a heat dump valve position command signal depending upon the type of temperature error. Logic controls are responsive to the combustor fuel injector command signals and the heat dump valve position command signal to prevent the combustor fuel injector command signal from being generated if the heat dump valve is opened or, alternately, from preventing the heat dump valve position command signal from being generated if the combustor fuel injector is opened.
Automatic Generation of Cycle-Approximate TLMs with Timed RTOS Model Support

NASA Astrophysics Data System (ADS)

Hwang, Yonghyun; Schirner, Gunar; Abdi, Samar

This paper presents a technique for automatically generating cycle-approximate transaction level models (TLMs) for multi-process applications mapped to embedded platforms. It incorporates three key features: (a) basic block level timing annotation, (b) RTOS model integration, and (c) RTOS overhead delay modeling. The inputs to TLM generation are application C processes and their mapping to processors in the platform. A processor data model, including pipelined datapath, memory hierarchy and branch delay model is used to estimate basic block execution delays. The delays are annotated to the C code, which is then integrated with a generated SystemC RTOS model. Our abstract RTOS provides dynamic scheduling and inter-process communication (IPC) with processor- and RTOS-specific pre-characterized timing. Our experiments using a MP3 decoder and a JPEG encoder show that timed TLMs, with integrated RTOS models, can be automatically generated in less than a minute. Our generated TLMs simulated three times faster than real-time and showed less than 10% timing error compared to board measurements.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, M.S.; Strip, D.R.

1996-01-30

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G.; Salapura, Valentina

2012-07-24

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Computational algorithms for simulations in atmospheric optics.

PubMed

Konyaev, P A; Lukin, V P

2016-04-20

A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Neurovision processor for designing intelligent sensors

NASA Astrophysics Data System (ADS)

Gupta, Madan M.; Knopf, George K.

1992-03-01

A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Using the automata processor for fast pattern recognition in high energy physics experiments. A proof of concept

DOE PAGES

Michael H. L. S. Wang; Cancelo, Gustavo; Green, Christopher; ...

2016-06-25

Here, we explore the Micron Automata Processor (AP) as a suitable commodity technology that can address the growing computational needs of pattern recognition in High Energy Physics (HEP) experiments. A toy detector model is developed for which an electron track confirmation trigger based on the Micron AP serves as a test case. Although primarily meant for high speed text-based searches, we demonstrate a proof of concept for the use of the Micron AP in a HEP trigger application.
SIGPROC: Pulsar Signal Processing Programs

NASA Astrophysics Data System (ADS)

Lorimer, D. R.

2011-07-01

SIGPROC is a package designed to standardize the initial analysis of the many types of fast-sampled pulsar data. Currently recognized machines are the Wide Band Arecibo Pulsar Processor (WAPP), the Penn State Pulsar Machine (PSPM), the Arecibo Observatory Fourier Transform Machine (AOFTM), the Berkeley Pulsar Processors (BPP), the Parkes/Jodrell 1-bit filterbanks (SCAMP) and the filterbank at the Ooty radio telescope (OOTY). The SIGPROC tools should help users look at their data quickly, without the need to write (yet) another routine to read data or worry about big/little endian compatibility (byte swapping is handled automatically).
Using the automata processor for fast pattern recognition in high energy physics experiments. A proof of concept

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michael H. L. S. Wang; Cancelo, Gustavo; Green, Christopher

Here, we explore the Micron Automata Processor (AP) as a suitable commodity technology that can address the growing computational needs of pattern recognition in High Energy Physics (HEP) experiments. A toy detector model is developed for which an electron track confirmation trigger based on the Micron AP serves as a test case. Although primarily meant for high speed text-based searches, we demonstrate a proof of concept for the use of the Micron AP in a HEP trigger application.
Resource and Performance Evaluations of Fixed Point QRD-RLS Systolic Array through FPGA Implementation

NASA Astrophysics Data System (ADS)

Yokoyama, Yoshiaki; Kim, Minseok; Arai, Hiroyuki

At present, when using space-time processing techniques with multiple antennas for mobile radio communication, real-time weight adaptation is necessary. Due to the progress of integrated circuit technology, dedicated processor implementation with ASIC or FPGA can be employed to implement various wireless applications. This paper presents a resource and performance evaluation of the QRD-RLS systolic array processor based on fixed-point CORDIC algorithm with FPGA. In this paper, to save hardware resources, we propose the shared architecture of a complex CORDIC processor. The required precision of internal calculation, the circuit area for the number of antenna elements and wordlength, and the processing speed will be evaluated. The resource estimation provides a possible processor configuration with a current FPGA on the market. Computer simulations assuming a fading channel will show a fast convergence property with a finite number of training symbols. The proposed architecture has also been implemented and its operation was verified by beamforming evaluation through a radio propagation experiment.
Advances in optical information processing IV; Proceedings of the Meeting, Orlando, FL, Apr. 18-20, 1990

NASA Astrophysics Data System (ADS)

Pape, Dennis R.

1990-09-01

The present conference discusses topics in optical image processing, optical signal processing, acoustooptic spectrum analyzer systems and components, and optical computing. Attention is given to tradeoffs in nonlinearly recorded matched filters, miniature spatial light modulators, detection and classification using higher-order statistics of optical matched filters, rapid traversal of an image data base using binary synthetic discriminant filters, wideband signal processing for emitter location, an acoustooptic processor for autonomous SAR guidance, and sampling of Fresnel transforms. Also discussed are an acoustooptic RF signal-acquisition system, scanning acoustooptic spectrum analyzers, the effects of aberrations on acoustooptic systems, fast optical digital arithmetic processors, information utilization in analog and digital processing, optical processors for smart structures, and a self-organizing neural network for unsupervised learning.
Optical Potential Field Mapping System

NASA Technical Reports Server (NTRS)

Reid, Max B. (Inventor)

1996-01-01

The present invention relates to an optical system for creating a potential field map of a bounded two dimensional region containing a goal location and an arbitrary number of obstacles. The potential field mapping system has an imaging device and a processor. Two image writing modes are used by the imaging device, electron deposition and electron depletion. Patterns written in electron deposition mode appear black and expand. Patterns written in electron depletion mode are sharp and appear white. The generated image represents a robot's workspace. The imaging device under processor control then writes a goal location in the work-space using the electron deposition mode. The black image of the goal expands in the workspace. The processor stores the generated images, and uses them to generate a feedback pattern. The feedback pattern is written in the workspace by the imaging device in the electron deposition mode to enhance the expansion of the original goal pattern. After the feedback pattern is written, an obstacle pattern is written by the imaging device in the electron depletion mode to represent the obstacles in the robot's workspace. The processor compares a stored image to a previously stored image to determine a change therebetween. When no change occurs, the processor averages the stored images to produce the potential field map.
A fast parallel 3D Poisson solver with longitudinal periodic and transverse open boundary conditions for space-charge simulations

NASA Astrophysics Data System (ADS)

Qiang, Ji

2017-10-01

A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
Computing an operating parameter of a unified power flow controller

DOEpatents

Wilson, David G.; Robinett, III, Rush D.

2017-12-26

A Unified Power Flow Controller described herein comprises a sensor that outputs at least one sensed condition, a processor that receives the at least one sensed condition, a memory that comprises control logic that is executable by the processor; and power electronics that comprise power storage, wherein the processor causes the power electronics to selectively cause the power storage to act as one of a power generator or a load based at least in part upon the at least one sensed condition output by the sensor and the control logic, and wherein at least one operating parameter of the power electronics is designed to facilitate maximal transmittal of electrical power generated at a variable power generation system to a grid system while meeting power constraints set forth by the electrical power grid.
Adaptive control for accelerators

DOEpatents

Eaton, Lawrie E.; Jachim, Stephen P.; Natter, Eckard F.

1991-01-01

An adaptive feedforward control loop is provided to stabilize accelerator beam loading of the radio frequency field in an accelerator cavity during successive pulses of the beam into the cavity. A digital signal processor enables an adaptive algorithm to generate a feedforward error correcting signal functionally determined by the feedback error obtained by a beam pulse loading the cavity after the previous correcting signal was applied to the cavity. Each cavity feedforward correcting signal is successively stored in the digital processor and modified by the feedback error resulting from its application to generate the next feedforward error correcting signal. A feedforward error correcting signal is generated by the digital processor in advance of the beam pulse to enable a composite correcting signal and the beam pulse to arrive concurrently at the cavity.
Computing an operating parameter of a unified power flow controller

DOEpatents

Wilson, David G; Robinett, III, Rush D

2015-01-06

A Unified Power Flow Controller described herein comprises a sensor that outputs at least one sensed condition, a processor that receives the at least one sensed condition, a memory that comprises control logic that is executable by the processor; and power electronics that comprise power storage, wherein the processor causes the power electronics to selectively cause the power storage to act as one of a power generator or a load based at least in part upon the at least one sensed condition output by the sensor and the control logic, and wherein at least one operating parameter of the power electronics is designed to facilitate maximal transmittal of electrical power generated at a variable power generation system to a grid system while meeting power constraints set forth by the electrical power grid.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less

2005 6th Annual Science and Engineering Technology Conference

DTIC Science & Technology

2005-04-21

BioFAC VBAIDS Hybrid: PCR/Immuno Fast PCR Fast Immunoassay Mass Spec (Pyrolysis) SIBS UV -LIF IR Fluorochrome Charge Detect. BioCADS Trigger Advanced...Weights Beam forming Signal Processing mapped to GPU architecture Vector Processor STAP (STAP-BOY) GaN High Frequency Transistor (WBG-RF) UV Laser...Service anti- counterfeiting • Embedded security strips Technology Limitations and Barriers • Training and cost (training intensive) Land Borders North Land
AN OPTIMIZED 64X64 POINT TWO-DIMENSIONAL FAST FOURIER TRANSFORM

NASA Technical Reports Server (NTRS)

Miko, J.

1994-01-01

Scientists at Goddard have developed an efficient and powerful program-- An Optimized 64x64 Point Two-Dimensional Fast Fourier Transform-- which combines the performance of real and complex valued one-dimensional Fast Fourier Transforms (FFT's) to execute a two-dimensional FFT and its power spectrum coefficients. These coefficients can be used in many applications, including spectrum analysis, convolution, digital filtering, image processing, and data compression. The program's efficiency results from its technique of expanding all arithmetic operations within one 64-point FFT; its high processing rate results from its operation on a high-speed digital signal processor. For non-real-time analysis, the program requires as input an ASCII data file of 64x64 (4096) real valued data points. As output, this analysis produces an ASCII data file of 64x64 power spectrum coefficients. To generate these coefficients, the program employs a row-column decomposition technique. First, it performs a radix-4 one-dimensional FFT on each row of input, producing complex valued results. Then, it performs a one-dimensional FFT on each column of these results to produce complex valued two-dimensional FFT results. Finally, the program sums the squares of the real and imaginary values to generate the power spectrum coefficients. The program requires a Banshee accelerator board with 128K bytes of memory from Atlanta Signal Processors (404/892-7265) installed on an IBM PC/AT compatible computer (DOS ver. 3.0 or higher) with at least one 16-bit expansion slot. For real-time operation, an ASPI daughter board is also needed. The real-time configuration reads 16-bit integer input data directly into the accelerator board, operating on 64x64 point frames of data. The program's memory management also allows accumulation of the coefficient results. The real-time processing rate to calculate and accumulate the 64x64 power spectrum output coefficients is less than 17.0 mSec. Documentation is included in the price of the program. Source code is written in C, 8086 Assembly, and Texas Instruments TMS320C30 Assembly Languages. This program is available on a 5.25 inch 360K MS-DOS format diskette. IBM and IBM PC are registered trademarks of International Business Machines. MS-DOS is a registered trademark of Microsoft Corporation.
Proceedings of the Interservice/Industry Training Systems Conference (9th), Held at Washington, DC, on 30 November - 2 December 1987

DTIC Science & Technology

1987-12-01

requires much more data, but holds fast to the idea that the FV approach, or some other model, is critical if the job analysis process is to have its...Ada compiled code executes twice as fast as Microsoft’s Fortran compiled code. This conclusion is at variance with the results obtained from...finish is not so important. Hence, if a design methodology produces coda that will not execute fast enough on processors suitable for flight
Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Bolosky, William Joseph

1993-01-01

Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

DTIC Science & Technology

2013-09-30

STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
Software for embedded processors: Problems and solutions

NASA Astrophysics Data System (ADS)

Bogaerts, J. A. C.

1990-08-01

Data Acquistion systems in HEP experiments use a wide spectrum of computers to cope with two major problems: high event rates and a large data volume. They do this by using special fast trigger processors at the source to reduce the event rate by several orders of magnitude. The next stage of a data acquisition system consists of a network of fast but conventional microprocessors which are embedded in high speed bus systems where data is still further reduced, filtered and merged. In the final stage complete events are farmed out to a another collection of processors, which reconstruct the events and perhaps achieve a further event rejection by a small factor, prior to recording onto magnetic tape. Detectors are monitored by analyzing a fraction of the data. This may be done for individual detectors at an early state of the data acquisition or it may be delayed till the complete events are available. A network of workstations is used for monitoring, displays and run control. Software for trigger processors must have a simple structure. Rejection algorithms are carefully optimized, and overheads introduced by system software cannot be tolerated. The embedded microprocessors have to co-operate, and need to be synchronized with the preceding and following stages. Real time kernels are typically used to solve synchronization and communication problems. Applications are usually coded in C, which is reasonably efficient and allows direct control over low level hardware functions. Event reconstruction software is very similar or even identical to offline software, predominantly written in FORTRAN. With the advent of powerful RISC processors, and with manufacturers tending to adopt open bus architectures, there is a move towards commercial processors and hence the introduction of the UNIX operating system. Building and controlling such a heterogeneous data acquisition system puts a heavy strain on the software. Communications is now as important as CPU capacity and I/O bandwidth, the traditional key parameters of a HEP data acquisition system. Software engineering and real time system simulation tools are becoming indispensible for the design of future data acquisition systems.
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency

NASA Astrophysics Data System (ADS)

Soderquist, Peter; Leeser, Miriam E.

1999-01-01

Digital video decoding, enabled by the MPEG-2 Video standard, is an important future application for embedded systems, particularly PDAs and other information appliances. Many such system require portability and wireless communication capabilities, and thus face severe limitations in size and power consumption. This places a premium on integration and efficiency, and favors software solutions for video functionality over specialized hardware. The processors in most embedded system currently lack the computational power needed to perform video decoding, but a related and equally important problem is the required data bandwidth, and the need to cost-effectively insure adequate data supply. MPEG data sets are very large, and generate significant amounts of excess memory traffic for standard data caches, up to 100 times the amount required for decoding. Meanwhile, cost and power limitations restrict cache sizes in embedded systems. Some systems, including many media processors, eliminate caches in favor of memories under direct, painstaking software control in the manner of digital signal processors. Yet MPEG data has locality which caches can exploit if properly optimized, providing fast, flexible, and automatic data supply. We propose a set of enhancements which target the specific needs of the heterogeneous types within the MPEG decoder working set. These optimizations significantly improve the efficiency of small caches, reducing cache-memory traffic by almost 70 percent, and can make an enhanced 4 KB cache perform better than a standard 1 MB cache. This performance improvement can enable high-resolution, full frame rate video playback in cheaper, smaller system than woudl otherwise be possible.
System and method for programmable bank selection for banked memory subsystems

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

2010-09-07

A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.

PubMed

Hach, Faraz; Sarrafi, Iman; Hormozdiari, Farhad; Alkan, Can; Eichler, Evan E; Sahinalp, S Cenk

2014-07-01

High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Onboard processor technology review

NASA Technical Reports Server (NTRS)

Benz, Harry F.

1990-01-01

The general need and requirements for the onboard embedded processors necessary to control and manipulate data in spacecraft systems are discussed. The current known requirements are reviewed from a user perspective, based on current practices in the spacecraft development process. The current capabilities of available processor technologies are then discussed, and these are projected to the generation of spacecraft computers currently under identified, funded development. An appraisal is provided for the current national developmental effort.
Job-mix modeling and system analysis of an aerospace multiprocessor.

NASA Technical Reports Server (NTRS)

Mallach, E. G.

1972-01-01

An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Implementation of Adaptive Digital Controllers on Programmable Logic Devices

NASA Technical Reports Server (NTRS)

Gwaltney, David A.; King, Kenneth D.; Smith, Keary J.; Monenegro, Justino (Technical Monitor)

2002-01-01

Much has been made of the capabilities of FPGA's (Field Programmable Gate Arrays) in the hardware implementation of fast digital signal processing. Such capability also makes an FPGA a suitable platform for the digital implementation of closed loop controllers. Other researchers have implemented a variety of closed-loop digital controllers on FPGA's. Some of these controllers include the widely used proportional-integral-derivative (PID) controller, state space controllers, neural network and fuzzy logic based controllers. There are myriad advantages to utilizing an FPGA for discrete-time control functions which include the capability for reconfiguration when SRAM-based FPGA's are employed, fast parallel implementation of multiple control loops and implementations that can meet space level radiation tolerance requirements in a compact form-factor. Generally, a software implementation on a DSP (Digital Signal Processor) or microcontroller is used to implement digital controllers. At Marshall Space Flight Center, the Control Electronics Group has been studying adaptive discrete-time control of motor driven actuator systems using digital signal processor (DSP) devices. While small form factor, commercial DSP devices are now available with event capture, data conversion, pulse width modulated (PWM) outputs and communication peripherals, these devices are not currently available in designs and packages which meet space level radiation requirements. In general, very few DSP devices are produced that are designed to meet any level of radiation tolerance or hardness. The goal of this effort is to create a fully digital, flight ready controller design that utilizes an FPGA for implementation of signal conditioning for control feedback signals, generation of commands to the controlled system, and hardware insertion of adaptive control algorithm approaches. An alternative is required for compact implementation of such functionality to withstand the harsh environment encountered on spacecraft. Radiation tolerant FPGA's are a feasible option for reaching this goal.
Evaluating local indirect addressing in SIMD proc essors

NASA Technical Reports Server (NTRS)

Middleton, David; Tomboulian, Sherryl

1989-01-01

In the design of parallel computers, there exists a tradeoff between the number and power of individual processors. The single instruction stream, multiple data stream (SIMD) model of parallel computers lies at one extreme of the resulting spectrum. The available hardware resources are devoted to creating the largest possible number of processors, and consequently each individual processor must use the fewest possible resources. Disagreement exists as to whether SIMD processors should be able to generate addresses individually into their local data memory, or all processors should access the same address. The tradeoff is examined between the increased capability and the reduced number of processors that occurs in this single instruction stream, multiple, locally addressed, data (SIMLAD) model. Factors are assembled that affect this design choice, and the SIMLAD model is compared with the bare SIMD and the MIMD models.
Detailed description of the HP-9825A HFRMP trajectory processor (TRAJ)

NASA Technical Reports Server (NTRS)

Kindall, S. M.; Wilson, S. W.

1979-01-01

The computer code for the trajectory processor of the HP-9825A High Fidelity Relative Motion Program is described in detail. The processor is a 12-degrees-of-freedom trajectory integrator which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. Coding standards and flow charts are given and the computational logic is discussed.
A Trade Study of Two Membrane-Aerated Biological Water Processors

NASA Technical Reports Server (NTRS)

Allada, Ram; Lange, Kevin; Vega. Leticia; Roberts, Michael S.; Jackson, Andrew; Anderson, Molly; Pickering, Karen

2011-01-01

Biologically based systems are under evaluation as primary water processors for next generation life support systems due to their low power requirements and their inherent regenerative nature. This paper will summarize the results of two recent studies involving membrane aerated biological water processors and present results of a trade study comparing the two systems with regards to waste stream composition, nutrient loading and system design. Results of optimal configurations will be presented.
Progress in fast, accurate multi-scale climate simulations

DOE PAGES

Collins, W. D.; Johansen, H.; Evans, K. J.; ...

2015-06-01

We present a survey of physical and computational techniques that have the potential to contribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth with these computational improvements include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enablingmore » improved accuracy and fidelity in simulation of dynamics and allowing more complete representations of climate features at the global scale. At the same time, partnerships with computer science teams have focused on taking advantage of evolving computer architectures such as many-core processors and GPUs. As a result, approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
Fast Context Switching in Real-Time Propositional Reasoning

NASA Technical Reports Server (NTRS)

Nayak, P. Pandurang; Williams, Brian C.

1997-01-01

The trend to increasingly capable and affordable control processors has generated an explosion of embedded real-time gadgets that serve almost every function imaginable. The daunting task of programming these gadgets is greatly alleviated with real-time deductive engines that perform all execution and monitoring functions from a single core model, Fast response times are achieved using an incremental propositional deductive database (an LTMS). Ideally the cost of an LTMS's incremental update should be linear in the number of labels that change between successive contexts. Unfortunately an LTMS can expend a significant percentage of its time working on labels that remain constant between contexts. This is caused by the LTMS's conservative approach: a context switch first removes all consequences of deleted clauses, whether or not those consequences hold in the new context. This paper presents a more aggressive incremental TMS, called the ITMS, that avoids processing a significant number of these consequences that are unchanged. Our empirical evaluation for spacecraft control shows that the overhead of processing unchanged consequences can be reduced by a factor of seven.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

An FPGA computing demo core for space charge simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jinyuan; Huang, Yifei; /Fermilab

2009-01-01

In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
A low power biomedical signal processor ASIC based on hardware software codesign.

PubMed

Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

2009-01-01

A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.

1991-01-01

A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
Design, characterization and control of the Unique Mobility Corporation robot

NASA Technical Reports Server (NTRS)

Velasco, Virgilio B., Jr.; Newman, Wyatt S.; Steinetz, Bruce; Kopf, Carlo; Malik, John

1994-01-01

Space and mass are at a premium on any space mission, and thus any machinery designed for space use should be lightweight and compact, without sacrificing strength. It is for this reason that NASA/LeRC contracted Unique Mobility Corporation to exploit their novel actuator designs to build a robot that would advance the present state of technology with respect to these requirements. Custom-designed motors are the key feature of this robot. They are compact, high-performance dc brushless servo motors with a high pole count and low inductance, thus permitting high torque generation and rapid phase commutation. Using a custom-designed digital signal processor-based controller board, the pulse width modulation power amplifiers regulate the fast dynamics of the motor currents. In addition, the programmable digital signal processor (DSP) controller permits implementation of nonlinear compensation algorithms to account for motoring vs. regeneration, torque ripple, and back-EMF. As a result, the motors produce a high torque relative to their size and weight, and can do so with good torque regulation and acceptably high velocity saturation limits. This paper presents the Unique Mobility Corporation robot prototype: its actuators, its kinematic design, its control system, and its experimental characterization. Performance results, including saturation torques, saturation velocities and tracking accuracy tests are included.
Simulation of a Real-Time Local Data Integration System over East-Central Florida

NASA Technical Reports Server (NTRS)

Case, Jonathan

1999-01-01

The Applied Meteorology Unit (AMU) simulated a real-time configuration of a Local Data Integration System (LDIS) using data from 15-28 February 1999. The objectives were to assess the utility of a simulated real-time LDIS, evaluate and extrapolate system performance to identify the hardware necessary to run a real-time LDIS, and determine the sensitivities of LDIS. The ultimate goal for running LDIS is to generate analysis products that enhance short-range (less than 6 h) weather forecasts issued in support of the 45th Weather Squadron, Spaceflight Meteorology Group, and Melbourne National Weather Service operational requirements. The simulation used the Advanced Regional Prediction System (ARPS) Data Analysis System (ADAS) software on an IBM RS/6000 workstation with a 67-MHz processor. This configuration ran in real-time, but not sufficiently fast for operational requirements. Thus, the AMU recommends a workstation with a 200-MHz processor and 512 megabytes of memory to run the AMU's configuration of LDIS in real-time. This report presents results from two case studies and several data sensitivity experiments. ADAS demonstrates utility through its ability to depict high-resolution cloud and wind features in a variety of weather situations. The sensitivity experiments illustrate the influence of disparate data on the resulting ADAS analyses.
Fast coincidence counting with active inspection systems

NASA Astrophysics Data System (ADS)

Mullens, J. A.; Neal, J. S.; Hausladen, P. A.; Pozzi, S. A.; Mihalczo, J. T.

2005-12-01

This paper describes 2nd and 3rd order time coincidence distributions measurements with a GHz processor that synchronously samples 5 or 10 channels of data from radiation detectors near fissile material. On-line, time coincidence distributions are measured between detectors or between detectors and an external stimulating source. Detector-to-detector correlations are useful for passive measurements also. The processor also measures the number of times n pulses occur in a selectable time window and compares this multiplet distribution to a Poisson distribution as a method of determining the occurrence of fission. The detectors respond to radiation emitted in the fission process induced internally by inherent sources or by external sources such as LINACS, DT generators either pulsed or steady state with alpha detectors, etc. Data can be acquired from prompt emission during the source pulse, prompt emissions immediately after the source pulse, or delayed emissions between source pulses. These types of time coincidence measurements (occurring on the time scale of the fission chain multiplication processes for nuclear weapons grade U and Pu) are useful for determining the presence of these fissile materials and quantifying the amount, and are useful for counter terrorism and nuclear material control and accountability. This paper presents the results for a variety of measurements.
A High Performance VLSI Computer Architecture For Computer Graphics

NASA Astrophysics Data System (ADS)

Chin, Chi-Yuan; Lin, Wen-Tai

1988-10-01

A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Atmospheric Correction Inter-comparison Exercise (ACIX)

NASA Astrophysics Data System (ADS)

Vermote, E.; Doxani, G.; Gascon, F.; Roger, J. C.; Skakun, S.

2017-12-01

The free and open data access policy to Landsat-8 (L-8) and Sentinel-2 (S-2) satellite imagery has encouraged the development of atmospheric correction (AC) approaches for generating Bottom-of-Atmosphere (BOA) products. Several entities have started to generate (or plan to generate in the short term) BOA reflectance products at global scale for L-8 and S-2 missions. To this end, the European Space Agency (ESA) and National Aeronautics and Space Administration (NASA) have initiated an exercise on the inter-comparison of the available AC processors. The results of the exercise are expected to point out the strengths and weaknesses, as well as communalities and discrepancies of various AC processors, in order to suggest and define ways for their further improvement. In particular, 11 atmospheric processors from five different countries participate in ACIX with the aim to inter-compare their performance when applied to L-8 and S-2 data. All the processors should be operational without requiring parametrization when applied on different areas. A protocol describing in details the inter-comparison metrics and the test dataset based on the AERONET sites has been agreed unanimously during the 1st ACIX workshop in June 2016. In particular, a basic and an advanced run of each of the processor were requested in the frame of ACIX, with the aim to draw robust and reliable conclusions on the processors' performance. The protocol also describes the comparison metrics of the aerosol optical thickness and water vapour products of the processors with the corresponding AERONET measurements. Moreover, concerning the surface reflectances, the inter-comparison among the processors is defined, as well as the comparison with the MODIS surface reflectance and with a reference surface reflectance product. Such a reference product will be obtained using the AERONET characterization of the aerosol (size distribution and refractive indices) and an accurate radiative transfer code. The inter-comparison outcomes were presented and discussed among the ACIX participants in the 2nd ACIX workshop, which was held on 11-12 April 2017 (ESRIN/ESA) and a detailed report was compiled. The proposed presentation is an opportunity for the user community to be informed about the ACIX results and conclusions.
Real-time trajectory optimization on parallel processors

NASA Technical Reports Server (NTRS)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Methods and Apparatus for Aggregation of Multiple Pulse Code Modulation Channels into a Signal Time Division Multiplexing Stream

NASA Technical Reports Server (NTRS)

Chang, Chen J. (Inventor); Liaghati, Jr., Amir L. (Inventor); Liaghati, Mahsa L. (Inventor)

2018-01-01

Methods and apparatus are provided for telemetry processing using a telemetry processor. The telemetry processor can include a plurality of communications interfaces, a computer processor, and data storage. The telemetry processor can buffer sensor data by: receiving a frame of sensor data using a first communications interface and clock data using a second communications interface, receiving an end of frame signal using a third communications interface, and storing the received frame of sensor data in the data storage. After buffering the sensor data, the telemetry processor can generate an encapsulated data packet including a single encapsulated data packet header, the buffered sensor data, and identifiers identifying telemetry devices that provided the sensor data. A format of the encapsulated data packet can comply with a Consultative Committee for Space Data Systems (CCSDS) standard. The telemetry processor can send the encapsulated data packet using a fourth and a fifth communications interfaces.
Advanced Power Electronic Interfaces for Distributed Energy Systems, Part 2: Modeling, Development, and Experimental Evaluation of Advanced Control Functions for Single-Phase Utility-Connected Inverter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chakraborty, S.; Kroposki, B.; Kramer, W.

Integrating renewable energy and distributed generations into the Smart Grid architecture requires power electronic (PE) for energy conversion. The key to reaching successful Smart Grid implementation is to develop interoperable, intelligent, and advanced PE technology that improves and accelerates the use of distributed energy resource systems. This report describes the simulation, design, and testing of a single-phase DC-to-AC inverter developed to operate in both islanded and utility-connected mode. It provides results on both the simulations and the experiments conducted, demonstrating the ability of the inverter to provide advanced control functions such as power flow and VAR/voltage regulation. This report alsomore » analyzes two different techniques used for digital signal processor (DSP) code generation. Initially, the DSP code was written in C programming language using Texas Instrument's Code Composer Studio. In a later stage of the research, the Simulink DSP toolbox was used to self-generate code for the DSP. The successful tests using Simulink self-generated DSP codes show promise for fast prototyping of PE controls.« less
Converted and upgraded maps programmed in the newer speech processor for the first generation of multichannel cochlear implant.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, M Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2013-09-01

To identify the technological contributions of the newer version of speech processor to the first generation of multichannel cochlear implant and the satisfaction of users of the new technology. Among the new features available, we focused on the effect of the frequency allocation table, the T-SPL and C-SPL, and the preprocessing gain adjustments (adaptive dynamic range optimization). Prospective exploratory study. Cochlear implant center at hospital. Cochlear implant users of the Spectra processor with speech recognition in closed set. Seventeen patients were selected between the ages of 15 and 82 and deployed for more than 8 years. The technology update of the speech processor for the Nucleus 22. To determine Freedom's contribution, thresholds and speech perception tests were performed with the last map used with the Spectra and the maps created for Freedom. To identify the effect of the frequency allocation table, both upgraded and converted maps were programmed. One map was programmed with 25 dB T-SPL and 65 dB C-SPL and the other map with adaptive dynamic range optimization. To assess satisfaction, SADL and APHAB were used. All speech perception tests and all sound field thresholds were statistically better with the new speech processor; 64.7% of patients preferred maintaining the same frequency table that was suggested for the older processor. The sound field threshold was statistically significant at 500, 1,000, 1,500, and 2,000 Hz with 25 dB T-SPL/65 dB C-SPL. Regarding patient's satisfaction, there was a statistically significant improvement, only in the subscale of speech in noise abilities and phone use. The new technology improved the performance of patients with the first generation of multichannel cochlear implant.
Efficient Sorting on the Tilera Manycore Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Tumeo, Antonino; Villa, Oreste

e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by ourmore » radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.« less
Parallel processor for real-time structural control

NASA Astrophysics Data System (ADS)

Tise, Bert L.

1993-07-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Feasibility study, software design, layout and simulation of a two-dimensional Fast Fourier Transform machine for use in optical array interferometry

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-02-01

Advances of computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources, and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generated inmore » a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of MCNP on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electron/photon transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load balancing schemes for homogeneous and heterogeneous networks.« less
The GF-3 SAR Data Processor

PubMed Central

Han, Bing; Ding, Chibiao; Zhong, Lihua; Liu, Jiayin; Qiu, Xiaolan; Hu, Yuxin; Lei, Bin

2018-01-01

The Gaofen-3 (GF-3) data processor was developed as a workstation-based GF-3 synthetic aperture radar (SAR) data processing system. The processor consists of two vital subsystems of the GF-3 ground segment, which are referred to as data ingesting subsystem (DIS) and product generation subsystem (PGS). The primary purpose of DIS is to record and catalogue GF-3 raw data with a transferring format, and PGS is to produce slant range or geocoded imagery from the signal data. This paper presents a brief introduction of the GF-3 data processor, including descriptions of the system architecture, the processing algorithms and its output format. PMID:29534464
Multiphase complete exchange: A theoretical analysis

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.

1993-01-01

Complete Exchange requires each of N processors to send a unique message to each of the remaining N-1 processors. For a circuit switched hypercube with N = 2(sub d) processors, the Direct and Standard algorithms for Complete Exchange are optimal for very large and very small message sizes, respectively. For intermediate sizes, a hybrid Multiphase algorithm is better. This carries out Direct exchanges on a set of subcubes whose dimensions are a partition of the integer d. The best such algorithm for a given message size m could hitherto only be found by enumerating all partitions of d. The Multiphase algorithm is analyzed assuming a high performance communication network. It is proved that only algorithms corresponding to equipartitions of d (partitions in which the maximum and minimum elements differ by at most 1) can possibly be optimal. The run times of these algorithms plotted against m form a hull of optimality. It is proved that, although there is an exponential number of partitions, (1) the number of faces on this hull is Theta(square root of d), (2) the hull can be found in theta(square root of d) time, and (3) once it has been found, the optimal algorithm for any given m can be found in Theta(log d) time. These results provide a very fast technique for minimizing communication overhead in many important applications, such as matrix transpose, Fast Fourier transform, and ADI.
Compact propane fuel processor for auxiliary power unit application

NASA Astrophysics Data System (ADS)

Dokupil, M.; Spitta, C.; Mathiak, J.; Beckhaus, P.; Heinzel, A.

With focus on mobile applications a fuel cell auxiliary power unit (APU) using liquefied petroleum gas (LPG) is currently being developed at the Centre for Fuel Cell Technology (Zentrum für BrennstoffzellenTechnik, ZBT gGmbH). The system is consisting of an integrated compact and lightweight fuel processor and a low temperature PEM fuel cell for an electric power output of 300 W. This article is presenting the current status of development of the fuel processor which is designed for a nominal hydrogen output of 1 k Wth,H2 within a load range from 50 to 120%. A modular setup was chosen defining a reformer/burner module and a CO-purification module. Based on the performance specifications, thermodynamic simulations, benchmarking and selection of catalysts the modules have been developed and characterised simultaneously and then assembled to the complete fuel processor. Automated operation results in a cold startup time of about 25 min for nominal load and carbon monoxide output concentrations below 50 ppm for steady state and dynamic operation. Also fast transient response of the fuel processor at load changes with low fluctuations of the reformate gas composition have been achieved. Beside the development of the main reactors the transfer of the fuel processor to an autonomous system is of major concern. Hence, concepts for packaging have been developed resulting in a volume of 7 l and a weight of 3 kg. Further a selection of peripheral components has been tested and evaluated regarding to the substitution of the laboratory equipment.
Hardware based redundant multi-threading inside a GPU for improved reliability

DOEpatents

Sridharan, Vilas; Gurumurthi, Sudhanva

2015-05-05

A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.

Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Status report of the end-to-end ASKAP software system: towards early science operations

NASA Astrophysics Data System (ADS)

Guzman, Juan Carlos; Chapman, Jessica; Marquarding, Malte; Whiting, Matthew

2016-08-01

The Australian SKA Pathfinder (ASKAP) is a novel centimetre radio synthesis telescope currently in the commissioning phase and located in the midwest region of Western Australia. It comprises of 36 x 12 m diameter reflector antennas each equipped with state-of-the-art and award winning Phased Array Feeds (PAF) technology. The PAFs provide a wide, 30 square degree field-of-view by forming up to 36 separate dual-polarisation beams at once. This results in a high data rate: 70 TB of correlated visibilities in an 8-hour observation, requiring custom-written, high-performance software running in dedicated High Performance Computing (HPC) facilities. The first six antennas equipped with first-generation PAF technology (Mark I), named the Boolardy Engineering Test Array (BETA) have been in use since 2014 as a platform to test PAF calibration and imaging techniques, and along the way it has been producing some great science results. Commissioning of the ASKAP Array Release 1, that is the first six antennas with second-generation PAFs (Mark II) is currently under way. An integral part of the instrument is the Central Processor platform hosted at the Pawsey Supercomputing Centre in Perth, which executes custom-written software pipelines, designed specifically to meet the ASKAP imaging requirements of wide field of view and high dynamic range. There are three key hardware components of the Central Processor: The ingest nodes (16 x node cluster), the fast temporary storage (1 PB Lustre file system) and the processing supercomputer (200 TFlop system). This High-Performance Computing (HPC) platform is managed and supported by the Pawsey support team. Due to the limited amount of data generated by BETA and the first ASKAP Array Release, the Central Processor platform has been running in a more "traditional" or user-interactive mode. But this is about to change: integration and verification of the online ingest pipeline starts in early 2016, which is required to support the full 300 MHz bandwidth for Array Release 1; followed by the deployment of the real-time data processing components. In addition to the Central Processor, the first production release of the CSIRO ASKAP Science Data Archive (CASDA) has also been deployed in one of the Pawsey Supercomputing Centre facilities and it is integrated to the end-to-end ASKAP data flow system. This paper describes the current status of the "end-to-end" data flow software system from preparing observations to data acquisition, processing and archiving; and the challenges of integrating an HPC facility as a key part of the instrument. It also shares some lessons learned since the start of integration activities and the challenges ahead in preparation for the start of the Early Science program.
Managing Power Heterogeneity

NASA Astrophysics Data System (ADS)

Pruhs, Kirk

A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Fast, accurate photon beam accelerator modeling using BEAMnrc: A systematic investigation of efficiency enhancing methods and cross-section data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fragoso, Margarida; Kawrakow, Iwan; Faddegon, Bruce A.

In this work, an investigation of efficiency enhancing methods and cross-section data in the BEAMnrc Monte Carlo (MC) code system is presented. Additionally, BEAMnrc was compared with VMC++, another special-purpose MC code system that has recently been enhanced for the simulation of the entire treatment head. BEAMnrc and VMC++ were used to simulate a 6 MV photon beam from a Siemens Primus linear accelerator (linac) and phase space (PHSP) files were generated at 100 cm source-to-surface distance for the 10x10 and 40x40 cm{sup 2} field sizes. The BEAMnrc parameters/techniques under investigation were grouped by (i) photon and bremsstrahlung cross sections,more » (ii) approximate efficiency improving techniques (AEITs), (iii) variance reduction techniques (VRTs), and (iv) a VRT (bremsstrahlung photon splitting) in combination with an AEIT (charged particle range rejection). The BEAMnrc PHSP file obtained without the efficiency enhancing techniques under study or, when not possible, with their default values (e.g., EXACT algorithm for the boundary crossing algorithm) and with the default cross-section data (PEGS4 and Bethe-Heitler) was used as the ''base line'' for accuracy verification of the PHSP files generated from the different groups described previously. Subsequently, a selection of the PHSP files was used as input for DOSXYZnrc-based water phantom dose calculations, which were verified against measurements. The performance of the different VRTs and AEITs available in BEAMnrc and of VMC++ was specified by the relative efficiency, i.e., by the efficiency of the MC simulation relative to that of the BEAMnrc base-line calculation. The highest relative efficiencies were {approx}935 ({approx}111 min on a single 2.6 GHz processor) and {approx}200 ({approx}45 min on a single processor) for the 10x10 field size with 50 million histories and 40x40 cm{sup 2} field size with 100 million histories, respectively, using the VRT directional bremsstrahlung splitting (DBS) with no electron splitting. When DBS was used with electron splitting and combined with augmented charged particle range rejection, a technique recently introduced in BEAMnrc, relative efficiencies were {approx}420 ({approx}253 min on a single processor) and {approx}175 ({approx}58 min on a single processor) for the 10x10 and 40x40 cm{sup 2} field sizes, respectively. Calculations of the Siemens Primus treatment head with VMC++ produced relative efficiencies of {approx}1400 ({approx}6 min on a single processor) and {approx}60 ({approx}4 min on a single processor) for the 10x10 and 40x40 cm{sup 2} field sizes, respectively. BEAMnrc PHSP calculations with DBS alone or DBS in combination with charged particle range rejection were more efficient than the other efficiency enhancing techniques used. Using VMC++, accurate simulations of the entire linac treatment head were performed within minutes on a single processor. Noteworthy differences ({+-}1%-3%) in the mean energy, planar fluence, and angular and spectral distributions were observed with the NIST bremsstrahlung cross sections compared with those of Bethe-Heitler (BEAMnrc default bremsstrahlung cross section). However, MC calculated dose distributions in water phantoms (using combinations of VRTs/AEITs and cross-section data) agreed within 2% of measurements. Furthermore, MC calculated dose distributions in a simulated water/air/water phantom, using NIST cross sections, were within 2% agreement with the BEAMnrc Bethe-Heitler default case.« less
Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

DTIC Science & Technology

2015-06-01

5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Extremely Fast Numerical Integration of Ocean Surface Wave Dynamics

DTIC Science & Technology

2007-09-30

sub-processor must be added as shown in the blue box of Fig. 1. We first consider the Kadomtsev - Petviashvili (KP) equation ηt + coηx +αηηx + βη ...analytic integration of the so-called “soliton equations ,” I have discovered how the GFT can be used to solved higher order equations for which study...analytical study and extremely fast numerical integration of the extended nonlinear Schroedinger equation for fully three dimensional wave motion
Single bus star connected reluctance drive and method

DOEpatents

Fahimi, Babak; Shamsi, Pourya

2016-05-10

A system and methods for operating a switched reluctance machine includes a controller, an inverter connected to the controller and to the switched reluctance machine, a hysteresis control connected to the controller and to the inverter, a set of sensors connected to the switched reluctance machine and to the controller, the switched reluctance machine further including a set of phases the controller further comprising a processor and a memory connected to the processor, wherein the processor programmed to execute a control process and a generation process.
Parallel processor for real-time structural control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tise, B.L.

1992-01-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
A digital retina-like low-level vision processor.

PubMed

Mertoguno, S; Bourbakis, N G

2003-01-01

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
Loran-C digital word generator for use with a KIM-1 microprocessor system

NASA Technical Reports Server (NTRS)

Nickum, J. D.

1977-01-01

The problem of translating the time of occurrence of received Loran-C pulses into a time, referenced to a particular period of occurrence is addressed and applied to the design of a digital word generator for a Loran-C sensor processor package. The digital information from this word generator is processed in a KIM-1 microprocessor system which is based on the MOS 6502 CPU. This final system will consist of a complete time difference sensor processor for determining position information using Loran-C charts. The system consists of the KIM-1 microprocessor module, a 4K RAM memory board, a user interface, and the Loran-C word generator.
A digital video tracking system

NASA Astrophysics Data System (ADS)

Giles, M. K.

1980-01-01

The Real-Time Videotheodolite (RTV) was developed in connection with the requirement to replace film as a recording medium to obtain the real-time location of an object in the field-of-view (FOV) of a long focal length theodolite. Design philosophy called for a system capable of discriminatory judgment in identifying the object to be tracked with 60 independent observations per second, capable of locating the center of mass of the object projection on the image plane within about 2% of the FOV in rapidly changing background/foreground situations, and able to generate a predicted observation angle for the next observation. A description is given of a number of subsystems of the RTV, taking into account the processor configuration, the video processor, the projection processor, the tracker processor, the control processor, and the optics interface and imaging subsystem.
Multitask neurovision processor with extensive feedback and feedforward connections

NASA Astrophysics Data System (ADS)

Gupta, Madan M.; Knopf, George K.

1991-11-01

A multi-task neuro-vision parameter which performs a variety of information processing operations associated with the early stages of biological vision is presented. The network architecture of this neuro-vision processor, called the positive-negative (PN) neural processor, is loosely based on the neural activity fields exhibited by thalamic and cortical nervous tissue layers. The computational operation performed by the processor arises from the strength of the recurrent feedback among the numerous positive and negative neural computing units. By adjusting the feedback connections it is possible to generate diverse dynamic behavior that may be used for short-term visual memory (STVM), spatio-temporal filtering (STF), and pulse frequency modulation (PFM). The information attributes that are to be processes may be regulated by modifying the feedforward connections from the signal space to the neural processor.
IECON '87: Industrial applications of control and simulation; Proceedings of the 1987 International Conference on Industrial Electronics, Control, and Instrumentation, Cambridge, MA, Nov. 3, 4, 1987

NASA Technical Reports Server (NTRS)

Hartley, Tom T. (Editor)

1987-01-01

Recent advances in control-system design and simulation are discussed in reviews and reports. Among the topics considered are fast algorithms for generating near-optimal binary decision programs, trajectory control of robot manipulators with compensation of load effects via a six-axis force sensor, matrix integrators for real-time simulation, a high-level control language for an autonomous land vehicle, and a practical engineering design method for stable model-reference adaptive systems. Also addressed are the identification and control of flexible-limb robots with unknown loads, adaptive control and robust adaptive control for manipulators with feedforward compensation, adaptive pole-placement controllers with predictive action, variable-structure strategies for motion control, and digital signal-processor-based variable-structure controls.
Parallel, stochastic measurement of molecular surface area.

PubMed

Juba, Derek; Varshney, Amitabh

2008-08-01

Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
A wideband software reconfigurable modem

NASA Astrophysics Data System (ADS)

Turner, J. H., Jr.; Vickers, H.

A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
The Mercury System: Embedding Computation into Disk Drives

DTIC Science & Technology

2004-08-20

enabling technologies to build extremely fast data search engines . We do this by moving the search closer to the data, and performing it in hardware...engine searches in parallel across a disk or disk surface 2. System Parallelism: Searching is off-loaded to search engines and main processor can
Searching for New Double Stars with a Computer

NASA Astrophysics Data System (ADS)

Bryant, T. V.

2015-04-01

The advent of computers with large amounts of RAM memory and fast processors, as well as easy internet access to large online astronomical databases, has made computer searches based on astrometric data practicable for most researchers. This paper describes one such search that has uncovered hitherto unrecognized double stars.
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Design of RISC Processor Using VHDL and Cadence

NASA Astrophysics Data System (ADS)

Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.

Digital system for structural dynamics simulation

NASA Technical Reports Server (NTRS)

Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.

1982-01-01

State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Interactive Digital Signal Processor

NASA Technical Reports Server (NTRS)

Mish, W. H.

1985-01-01

Interactive Digital Signal Processor, IDSP, consists of set of time series analysis "operators" based on various algorithms commonly used for digital signal analysis. Processing of digital signal time series to extract information usually achieved by applications of number of fairly standard operations. IDSP excellent teaching tool for demonstrating application for time series operators to artificially generated signals.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Scheduling time-critical graphics on multiple processors

NASA Technical Reports Server (NTRS)

Meyer, Tom W.; Hughes, John F.

1995-01-01

This paper describes an algorithm for the scheduling of time-critical rendering and computation tasks on single- and multiple-processor architectures, with minimal pipelining. It was developed to manage scientific visualization scenes consisting of hundreds of objects, each of which can be computed and displayed at thousands of possible resolution levels. The algorithm generates the time-critical schedule using progressive-refinement techniques; it always returns a feasible schedule and, when allowed to run to completion, produces a near-optimal schedule which takes advantage of almost the entire multiple-processor system.
HP-9825A HFRMP trajectory processor (#TRAJ), detailed description. [relative motion of the space shuttle orbiter and a free-flying payload

NASA Technical Reports Server (NTRS)

Kindall, S. M.

1980-01-01

The computer code for the trajectory processor (#TRAJ) of the high fidelity relative motion program is described. The #TRAJ processor is a 12-degrees-of-freedom trajectory integrator (6 degrees of freedom for each of two vehicles) which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. A listing of the code, coding standards and conventions, detailed flow charts, and discussions of the computational logic are included.
Language and Program for Documenting Software Design

NASA Technical Reports Server (NTRS)

Kleine, H.; Zepko, T. M.

1986-01-01

Software Design and Documentation Language (SDDL) provides effective communication medium to support design and documentation of complex software applications. SDDL supports communication among all members of software design team and provides for production of informative documentation on design effort. Use of SDDL-generated document to analyze design makes it possible to eliminate many errors not detected until coding and testing attempted. SDDL processor program translates designer's creative thinking into effective document for communication. Processor performs as many automatic functions as possible, freeing designer's energy for creative effort. SDDL processor program written in PASCAL.
Systems and methods for performing wireless financial transactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCown, Steven Harvey

2012-07-03

A secure computing module (SCM) is configured for connection with a host device. The SCM includes a processor for performing secure processing operations, a host interface for coupling the processor to the host device, and a memory connected to the processor wherein the processor logically isolates at least some of the memory from access by the host device. The SCM also includes a proximate-field wireless communicator connected to the processor to communicate with another SCM associated with another host device. The SCM generates a secure digital signature for a financial transaction package and communicates the package and the signature tomore » the other SCM using the proximate-field wireless communicator. Financial transactions are performed from person to person using the secure digital signature of each person's SCM and possibly message encryption. The digital signatures and transaction details are communicated to appropriate financial organizations to authenticate the transaction parties and complete the transaction.« less
ACIX: Atmospheric Correction Inter-comparison Exercise

NASA Astrophysics Data System (ADS)

Doxani, Georgia; Gascon, Ferran; Vermote, Éric; Roger, Jean-Claude

2017-04-01

The free and open data access policy to Sentinel-2 (S-2) and Landsat-8 (L-8) satellite imagery has stimulated the development of atmospheric correction (AC) processors for generating Bottom-of-Atmosphere (BOA) products. Several entities have started to generate (or plan to generate in the short term) BOA reflectance products at global scale for S-2 and L-8 missions. To this end, the European Space Agency (ESA) and NASA are organizing an exercise on AC processors inter-comparison. The results of the exercise are expected to point out the strengths and weaknesses, as well as communalities and discrepancies of various AC processors, in order to suggest and define ways for their further improvement. In particular, 13 atmospheric processors from five different countries participate in ACIX with the aim to inter-compare their performance when applied to L-8 and S-2 data. A protocol describing the inter-comparison process and the test dataset, which is based on the AERONET sites, will be presented. The protocol has been defined according to what was agreed among the participants during the 1st ACIX workshop held in June 2016. It includes the comparison of aerosol optical thickness and water vapour products of the processors with the AERONET measurements. Moreover, concerning the surface reflectances, the protocol describes the inter-comparison among the processors, as well as the comparison with the MODIS surface reflectance and with a reference surface reflectance product. Such a reference product will be obtained using the AERONET characterization of the aerosol (size distribution and refractive indices) and an accurate radiative transfer code. The inter-comparison outcomes will be presented and discussed among the participants in the 2nd ACIX workshop, which will be held on 11-12 April 2017 (ESRIN/ESA). The proposed presentation is an opportunity for the user community to be informed for the first time about the ACIX results and conclusions.
Editing wild points in isolation - Fast agreement for reliable systems (Preliminary version)

NASA Technical Reports Server (NTRS)

Kearns, Phil; Evans, Carol

1989-01-01

Consideration is given to the intuitively appealing notion of discarding sensor values which are strongly suspected of being erroneous in a modified approximate agreement protocol. Approximate agreement with editing imposes a time bound upon the convergence of the protocol - no such bound was possible for the original approximate agreement protocol. This new approach is potentially useful in the construction of asynchronous fault tolerant systems. The main result is that a wild-point replacement technique called t-worst editing can be shown to guarantee convergence of the approximate agreement protocol to a valid agreement value. Results are presented for a four-processor synchronous system in which a single processor may be faulty.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1991-01-01

The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor

PubMed Central

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-01-01

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation. PMID:27983714
Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.

PubMed

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-12-15

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
Novel Optical Processor for Phased Array Antenna.

DTIC Science & Technology

1992-10-20

parallel glass slide into the signal beam optical loop. The parallel glass acts like a variable phase shifter to the signal beam simulating phase drift...A list of possible designs are given as follows , _ _ Velocity fa (100dB/cm) Lumit Wavelength I M2I1 TeO2 Longi 4.2 /m/ns about 3 GHz 1.4 4m 34 Fast...subject to achievable acoustic frequency, the preferred materials are the slow shear wave in TeO2 , the fast shear wave in TeO2 or the shear waves in
Large-Constraint-Length, Fast Viterbi Decoder

NASA Technical Reports Server (NTRS)

Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.

1990-01-01

Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.
Application of Kevin-Voigt Model in Quantifying Whey Protein Adsorption on Polyethersulfone Using QCM-D

USDA-ARS?s Scientific Manuscript database

The study of protein adsorption on the membrane surface is of great importance to cheese-making processors that use polymeric membrane-based processes to recover whey protein from the process waste streams. Quartz crystal microbalance with dissipation (QCM-D) is a lab-scale, fast analytical techniq...
[Development of a video image system for wireless capsule endoscopes based on DSP].

PubMed

Yang, Li; Peng, Chenglin; Wu, Huafeng; Zhao, Dechun; Zhang, Jinhua

2008-02-01

A video image recorder to record video picture for wireless capsule endoscopes was designed. TMS320C6211 DSP of Texas Instruments Inc. is the core processor of this system. Images are periodically acquired from Composite Video Broadcast Signal (CVBS) source and scaled by video decoder (SAA7114H). Video data is transported from high speed buffer First-in First-out (FIFO) to Digital Signal Processor (DSP) under the control of Complex Programmable Logic Device (CPLD). This paper adopts JPEG algorithm for image coding, and the compressed data in DSP was stored to Compact Flash (CF) card. TMS320C6211 DSP is mainly used for image compression and data transporting. Fast Discrete Cosine Transform (DCT) algorithm and fast coefficient quantization algorithm are used to accelerate operation speed of DSP and decrease the executing code. At the same time, proper address is assigned for each memory, which has different speed;the memory structure is also optimized. In addition, this system uses plenty of Extended Direct Memory Access (EDMA) to transport and process image data, which results in stable and high performance.
General purpose pulse shape analysis for fast scintillators implemented in digital readout electronics

NASA Astrophysics Data System (ADS)

Asztalos, Stephen J.; Hennig, Wolfgang; Warburton, William K.

2016-01-01

Pulse shape discrimination applied to certain fast scintillators is usually performed offline. In sufficiently high-event rate environments data transfer and storage become problematic, which suggests a different analysis approach. In response, we have implemented a general purpose pulse shape analysis algorithm in the XIA Pixie-500 and Pixie-500 Express digital spectrometers. In this implementation waveforms are processed in real time, reducing the pulse characteristics to a few pulse shape analysis parameters and eliminating time-consuming waveform transfer and storage. We discuss implementation of these features, their advantages, necessary trade-offs and performance. Measurements from bench top and experimental setups using fast scintillators and XIA processors are presented.
Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP

NASA Technical Reports Server (NTRS)

Chan, Tony F.; Fatoohi, Rod A.

1990-01-01

The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.
A digitally implemented preambleless demodulator for maritime and mobile data communications

NASA Astrophysics Data System (ADS)

Chalmers, Harvey; Shenoy, Ajit; Verahrami, Farhad B.

The hardware design and software algorithms for a low-bit-rate, low-cost, all-digital preambleless demodulator are described. The demodulator operates under severe high-noise conditions, fast Doppler frequency shifts, large frequency offsets, and multipath fading. Sophisticated algorithms, including a fast Fourier transform (FFT)-based burst acquisition algorithm, a cycle-slip resistant carrier phase tracker, an innovative Doppler tracker, and a fast acquisition symbol synchronizer, were developed and extensively simulated for reliable burst reception. The compact digital signal processor (DSP)-based demodulator hardware uses a unique personal computer test interface for downloading test data files. The demodulator test results demonstrate a near-ideal performance within 0.2 dB of theory.

Development of a Next-Generation Membrane-Integrated Adsorption Processor for CO2 Removal and Compression for Closed-Loop Air Revitalization Systems

NASA Technical Reports Server (NTRS)

Mulloth, Lila; LeVan, Douglas

2002-01-01

The current CO2 removal technology of NASA is very energy intensive and contains many non-optimized subsystems. This paper discusses the concept of a next-generation, membrane integrated, adsorption processor for CO2 removal nd compression in closed-loop air revitalization systems. This processor will use many times less power than NASA's current CO2 removal technology and will be capable of maintaining a lower CO2 concentration in the cabin than that can be achieved by the existing CO2 removal systems. The compact, consolidated, configuration of gas dryer, CO2 separator, and CO2 compressor will allow continuous recycling of humid air in the cabin and supply of compressed CO2 to the reduction unit for oxygen recovery. The device has potential application to the International Space Station and future, long duration, transit, and planetary missions.
Digital signal processor and processing method for GPS receivers

NASA Technical Reports Server (NTRS)

Thomas, Jr., Jess B. (Inventor)

1989-01-01

A digital signal processor and processing method therefor for use in receivers of the NAVSTAR/GLOBAL POSITIONING SYSTEM (GPS) employs a digital carrier down-converter, digital code correlator and digital tracking processor. The digital carrier down-converter and code correlator consists of an all-digital, minimum bit implementation that utilizes digital chip and phase advancers, providing exceptional control and accuracy in feedback phase and in feedback delay. Roundoff and commensurability errors can be reduced to extremely small values (e.g., less than 100 nanochips and 100 nanocycles roundoff errors and 0.1 millichip and 1 millicycle commensurability errors). The digital tracking processor bases the fast feedback for phase and for group delay in the C/A, P.sub.1, and P.sub.2 channels on the L.sub.1 C/A carrier phase thereby maintaining lock at lower signal-to-noise ratios, reducing errors in feedback delays, reducing the frequency of cycle slips and in some cases obviating the need for quadrature processing in the P channels. Simple and reliable methods are employed for data bit synchronization, data bit removal and cycle counting. Improved precision in averaged output delay values is provided by carrier-aided data-compression techniques. The signal processor employs purely digital operations in the sense that exactly the same carrier phase and group delay measurements are obtained, to the last decimal place, every time the same sampled data (i.e., exactly the same bits) are processed.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Method for operating a combustor in a fuel cell system

DOEpatents

Chalfant, Robert W.; Clingerman, Bruce J.

2002-01-01

A method of operating a combustor to heat a fuel processor in a fuel cell system, in which the fuel processor generates a hydrogen-rich stream a portion of which is consumed in a fuel cell stack and a portion of which is discharged from the fuel cell stack and supplied to the combustor, and wherein first and second streams are supplied to the combustor, the first stream being a hydrocarbon fuel stream and the second stream consisting of said hydrogen-rich stream, the method comprising the steps of monitoring the temperature of the fuel processor; regulating the quantity of the first stream to the combustor according to the temperature of the fuel processor; and comparing said quantity of said first stream to a predetermined value or range of predetermined values.
Digital Platform for Wafer-Level MEMS Testing and Characterization Using Electrical Response

PubMed Central

Brito, Nuno; Ferreira, Carlos; Alves, Filipe; Cabral, Jorge; Gaspar, João; Monteiro, João; Rocha, Luís

2016-01-01

The uniqueness of microelectromechanical system (MEMS) devices, with their multiphysics characteristics, presents some limitations to the borrowed test methods from traditional integrated circuits (IC) manufacturing. Although some improvements have been performed, this specific area still lags behind when compared to the design and manufacturing competencies developed over the last decades by the IC industry. A complete digital solution for fast testing and characterization of inertial sensors with built-in actuation mechanisms is presented in this paper, with a fast, full-wafer test as a leading ambition. The full electrical approach and flexibility of modern hardware design technologies allow a fast adaptation for other physical domains with minimum effort. The digital system encloses a processor and the tailored signal acquisition, processing, control, and actuation hardware control modules, capable of the structure position and response analysis when subjected to controlled actuation signals in real time. The hardware performance, together with the simplicity of the sequential programming on a processor, results in a flexible and powerful tool to evaluate the newest and fastest control algorithms. The system enables measurement of resonant frequency (Fr), quality factor (Q), and pull-in voltage (Vpi) within 1.5 s with repeatability better than 5 ppt (parts per thousand). A full-wafer with 420 devices under test (DUTs) has been evaluated detecting the faulty devices and providing important design specification feedback to the designers. PMID:27657087
An 81.6 μW FastICA processor for epileptic seizure detection.

PubMed

Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming

2015-02-01

To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Digital Platform for Wafer-Level MEMS Testing and Characterization Using Electrical Response.

PubMed

Brito, Nuno; Ferreira, Carlos; Alves, Filipe; Cabral, Jorge; Gaspar, João; Monteiro, João; Rocha, Luís

2016-09-21

The uniqueness of microelectromechanical system (MEMS) devices, with their multiphysics characteristics, presents some limitations to the borrowed test methods from traditional integrated circuits (IC) manufacturing. Although some improvements have been performed, this specific area still lags behind when compared to the design and manufacturing competencies developed over the last decades by the IC industry. A complete digital solution for fast testing and characterization of inertial sensors with built-in actuation mechanisms is presented in this paper, with a fast, full-wafer test as a leading ambition. The full electrical approach and flexibility of modern hardware design technologies allow a fast adaptation for other physical domains with minimum effort. The digital system encloses a processor and the tailored signal acquisition, processing, control, and actuation hardware control modules, capable of the structure position and response analysis when subjected to controlled actuation signals in real time. The hardware performance, together with the simplicity of the sequential programming on a processor, results in a flexible and powerful tool to evaluate the newest and fastest control algorithms. The system enables measurement of resonant frequency (Fr), quality factor (Q), and pull-in voltage (Vpi) within 1.5 s with repeatability better than 5 ppt (parts per thousand). A full-wafer with 420 devices under test (DUTs) has been evaluated detecting the faulty devices and providing important design specification feedback to the designers.
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Mid-year report FY17 Q2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Pugmire, David; Rogers, David

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Pugmire, David; Rogers, David

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem. Mid-year report FY16 Q2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY15 Q4.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
Processor Units Reduce Satellite Construction Costs

NASA Technical Reports Server (NTRS)

2014-01-01

As part of the effort to build the Fast Affordable Science and Technology Satellite (FASTSAT), Marshall Space Flight Center developed a low-cost telemetry unit which is used to facilitate communication between a satellite and its receiving station. Huntsville, Alabama-based Orbital Telemetry Inc. has licensed the NASA technology and is offering to install the cost-cutting units on commercial satellites.
Experimental system for computer network via satellite /CS/. III - Network control processor

NASA Astrophysics Data System (ADS)

Kakinuma, Y.; Ito, A.; Takahashi, H.; Uchida, K.; Matsumoto, K.; Mitsudome, H.

1982-03-01

A network control processor (NCP) has the functions of generating traffics, the control of links and the control of transmitting bursts. The NCP executes protocols, monitors of experiments, gathering and compiling data of measurements, of which programs are loaded on a minicomputer (MELCOM 70/40) with 512KB of memories. The NCP acts as traffic generators, instead of a host computer, in the experiment. For this purpose, 15 fake stations are realized by the software in each user station. This paper describes the configuration of the NCP and the implementation of the protocols for the experimental system.
Comparison of speech perception performance between Sprint/Esprit 3G and Freedom processors in children implanted with nucleus cochlear implants.

PubMed

Santarelli, Rosamaria; Magnavita, Vincenzo; De Filippi, Roberta; Ventura, Laura; Genovese, Elisabetta; Arslan, Edoardo

2009-04-01

To compare speech perception performance in children fitted with previous generation Nucleus sound processor, Sprint or Esprit 3G, and the Freedom, the most recently released system from the Cochlear Corporation that features a larger input dynamic range. Prospective intrasubject comparative study. University Medical Center. Seventeen prelingually deafened children who had received the Nucleus 24 cochlear implant and used the Sprint or Esprit 3G sound processor. Cochlear implantation with Cochlear device. Speech perception was evaluated at baseline (Sprint, n = 11; Esprit 3G, n = 6) and after 1 month's experience with the Freedom sound processor. Identification and recognition of disyllabic words and identification of vowels were performed via recorded voice in quiet (70 dB [A]), in the presence of background noise at various levels of signal-to-noise ratio (+10, +5, 0, -5) and at a soft presentation level (60 dB [A]). Consonant identification and recognition of disyllabic words, trisyllabic words, and sentences were evaluated in live voice. Frequency discrimination was measured in a subset of subjects (n = 5) by using an adaptive, 3-interval, 3-alternative, forced-choice procedure. Identification of disyllabic words administered at a soft presentation level showed a significant increase when switching to the Freedom compared with the previously worn processor in children using the Sprint or Esprit 3G. Identification and recognition of disyllabic words in the presence of background noise as well as consonant identification and sentence recognition increased significantly for the Freedom compared with the previously worn device only in children fitted with the Sprint. Frequency discrimination was significantly better when switching to the Freedom compared with the previously worn processor. Serial comparisons revealed that that speech perception performance evaluated in children aged 5 to 15 years was superior with the Freedom than previous generations of Nucleus sound processors. These differences are deemed to ensue from an increased input dynamic range, a feature that offers potentially enhanced phonemic discrimination.
Performance of a natural gas fuel processor for residential PEFC system using a novel CO preferential oxidation catalyst

NASA Astrophysics Data System (ADS)

Echigo, Mitsuaki; Shinke, Norihisa; Takami, Susumu; Tabata, Takeshi

Natural gas fuel processors have been developed for 500 W and 1 kW class residential polymer electrolyte fuel cell (PEFC) systems. These fuel processors contain all the elements—desulfurizers, steam reformers, CO shift converters, CO preferential oxidation (PROX) reactors, steam generators, burners and heat exchangers—in one package. For the PROX reactor, a single-stage PROX process using a novel PROX catalyst was adopted. In the 1 kW class fuel processor, thermal efficiency of 83% at HHV was achieved at nominal output assuming a H 2 utilization rate in the cell stack of 76%. CO concentration below 1 ppm in the product gas was achieved even under the condition of [O 2]/[CO]=1.5 at the PROX reactor. The long-term durability of the fuel processor was demonstrated with almost no deterioration in thermal efficiency and CO concentration for 10,000 h, 1000 times start and stop cycles, 25,000 cycles of load change.
VLITE-Fast: A Real-time, 350 MHz Commensal VLA Survey for Fast Transients

NASA Astrophysics Data System (ADS)

Kerr, Matthew; Ray, Paul S.; Kassim, Namir E.; Clarke, Tracy; Deneva, Julia; Polisensky, Emil

2018-01-01

The VLITE (VLA Low Band Ionosphere and Transient Experiment; http://vlite.nrao.edu) program operates commensally during all Very Large Array observations, collecting data from 320 to 384 MHz. Recently expanded to include 16 antennas, the large field of view and huge time on sky offer good coverage of the transient, low-frequency sky. We describe the VLITE-Fast system, a GPU-based signal processor capable of detecting short (<1s) transients in real time and triggering recording of baseband voltage for offline imaging. In the case of Fast Radio Bursts, this offers the opportunity for discovering host galaxies of non-repeating FRBs, and in the case of single pulses, the identification of pulsar positions for dedicated follow-up. We describe the observing system, techniques for mitigating interference, and initial results from searches for FRBs.
Holo-Chidi video concentrator card

NASA Astrophysics Data System (ADS)

Nwodoh, Thomas A.; Prabhakar, Aditya; Benton, Stephen A.

2001-12-01

The Holo-Chidi Video Concentrator Card is a frame buffer for the Holo-Chidi holographic video processing system. Holo- Chidi is designed at the MIT Media Laboratory for real-time computation of computer generated holograms and the subsequent display of the holograms at video frame rates. The Holo-Chidi system is made of two sets of cards - the set of Processor cards and the set of Video Concentrator Cards (VCCs). The Processor cards are used for hologram computation, data archival/retrieval from a host system, and for higher-level control of the VCCs. The VCC formats computed holographic data from multiple hologram computing Processor cards, converting the digital data to analog form to feed the acousto-optic-modulators of the Media lab's Mark-II holographic display system. The Video Concentrator card is made of: a High-Speed I/O (HSIO) interface whence data is transferred from the hologram computing Processor cards, a set of FIFOs and video RAM used as buffer for data for the hololines being displayed, a one-chip integrated microprocessor and peripheral combination that handles communication with other VCCs and furnishes the card with a USB port, a co-processor which controls display data formatting, and D-to-A converters that convert digital fringes to analog form. The co-processor is implemented with an SRAM-based FPGA with over 500,000 gates and controls all the signals needed to format the data from the multiple Processor cards into the format required by Mark-II. A VCC has three HSIO ports through which up to 500 Megabytes of computed holographic data can flow from the Processor Cards to the VCC per second. A Holo-Chidi system with three VCCs has enough frame buffering capacity to hold up to thirty two 36Megabyte hologram frames at a time. Pre-computed holograms may also be loaded into the VCC from a host computer through the low- speed USB port. Both the microprocessor and the co- processor in the VCC can access the main system memory used to store control programs and data for the VCC. The Card also generates the control signals used by the scanning mirrors of Mark-II. In this paper we discuss the design of the VCC and its implementation in the Holo-Chidi system.
Vector generator scan converter

DOEpatents

Moore, James M.; Leighton, James F.

1990-01-01

High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O (input/output) channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardward for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold.
Vector generator scan converter

DOEpatents

Moore, J.M.; Leighton, J.F.

1988-02-05

High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardware for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold. 7 figs.

A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions

PubMed Central

Taylor, Richard L.; Bentley, Christopher D. B.; Pedernales, Julen S.; Lamata, Lucas; Solano, Enrique; Carvalho, André R. R.; Hope, Joseph J.

2017-01-01

Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10−5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period. PMID:28401945
A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions.

PubMed

Taylor, Richard L; Bentley, Christopher D B; Pedernales, Julen S; Lamata, Lucas; Solano, Enrique; Carvalho, André R R; Hope, Joseph J

2017-04-12

Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10 -5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period.
A fast, programmable hardware architecture for spaceborne SAR processing

NASA Technical Reports Server (NTRS)

Bennett, J. R.; Cumming, I. G.; Lim, J.; Wedding, R. M.

1983-01-01

The launch of spaceborne SARs during the 1980's is discussed. The satellite SARs require high quality and high throughput ground processors. Compression ratios in range and azimuth of greater than 500 and 150 respectively lead to frequency domain processing and data computation rates in excess of 2000 million real operations per second for C-band SARs under consideration. Various hardware architectures are examined and two promising candidates and proceeds to recommend a fast, programmable hardware architecture for spaceborne SAR processing are selected. Modularity and programmability are introduced as desirable attributes for the purpose of HTSP hardware selection.
Active Acoustics using Bellhop-DRDC: Run Time Tests and Suggested Configurations for a Tracking Exercise in Shallow Scotian Waters

DTIC Science & Technology

2005-05-01

simulée d’essai pour obtenir les diagrammes de perte de transmission et de réverbération pour 18 éléments (une source, un réseau remorqué et 16 bouées...were recorded using a 1.5GHz Pentium 4 processor. The test results indicate that the Bellhop program runs fast enough to provide the required acoustic...was determined that the Bellhop program will be fast enough for these clients. Future Plans It is intended to integrate further enhancements that
NSWC-NADC interactive communication links for AN/UYS-1 loadtape creation and retrieval

NASA Astrophysics Data System (ADS)

Greathouse, D. M.

1984-09-01

This report contains an alternative method of communication (interactive vs. remote batch) with the Naval Air Development Center for the creation and retrieval of AN/UYS-1 Advanced Signal Processor (ASP) operational software loadtapes. Operational software for the Digital Acoustic Sensor Simulator (DASS) program is developed and maintained at the Naval Air Development Center (NADC). The Facility for Automated Software Production (FASP), an NADC-resident software generation facility, provides the support tools necessary for data base creation, software development and maintenance, and loadtape generation. Once a loadtape file is generated at NADC, it must be retrieved via telephone transmission and placed in a format suitable for loading into the AN/UYS-1 Advanced Signal Processor (ASP).
Spacecraft on-board SAR image generation for EOS-type missions

NASA Technical Reports Server (NTRS)

Liu, K. Y.; Arens, W. E.; Assal, H. M.; Vesecky, J. F.

1987-01-01

Spacecraft on-board synthetic aperture radar (SAR) image generation is an extremely difficult problem because of the requirements for high computational rates (usually on the order of Giga-operations per second), high reliability (some missions last up to 10 years), and low power dissipation and mass (typically less than 500 watts and 100 Kilograms). Recently, a JPL study was performed to assess the feasibility of on-board SAR image generation for EOS-type missions. This paper summarizes the results of that study. Specifically, it proposes a processor architecture using a VLSI time-domain parallel array for azimuth correlation. Using available space qualifiable technology to implement the proposed architecture, an on-board SAR processor having acceptable power and mass characteristics appears feasible for EOS-type applications.
Recent developments of NASTRAN pre- amd post-processors: Response spectrum analysis (RESPAN) and interactive graphics (GIFTS)

NASA Technical Reports Server (NTRS)

Hirt, E. F.; Fox, G. L.

1982-01-01

Two specific NASTRAN preprocessors and postprocessors are examined. A postprocessor for dynamic analysis and a graphical interactive package for model generation and review of resuls are presented. A computer program that provides response spectrum analysis capability based on data from NASTRAN finite element model is described and the GIFTS system, a graphic processor to augment NASTRAN is introduced.
Next Generation Security for the 10,240 Processor Columbia System

NASA Technical Reports Server (NTRS)

Hinke, Thomas; Kolano, Paul; Shaw, Derek; Keller, Chris; Tweton, Dave; Welch, Todd; Liu, Wen (Betty)

2005-01-01

This presentation includes a discussion of the Columbia 10,240-processor system located at the NASA Advanced Supercomputing (NAS) division at the NASA Ames Research Center which supports each of NASA's four missions: science, exploration systems, aeronautics, and space operations. It is comprised of 20 Silicon Graphics nodes, each consisting of 512 Itanium II processors. A 64 processor Columbia front-end system supports users as they prepare their jobs and then submits them to the PBS system. Columbia nodes and front-end systems use the Linux OS. Prior to SC04, the Columbia system was used to attain a processing speed of 51.87 TeraFlops, which made it number two on the Top 500 list of the world's supercomputers and the world's fastest "operational" supercomputer since it was fully engaged in supporting NASA users.
Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

NASA Astrophysics Data System (ADS)

Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

2017-07-01

Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.
Zonal methods for the parallel execution of range-limited N-body simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Kevin J.; Dror, Ron O.; Shaw, David E.

2007-01-20

Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introducedmore » two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.« less
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

NASA Technical Reports Server (NTRS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.
FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.

PubMed

Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora

2013-09-01

In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.
Multivariate interactive digital analysis system /MIDAS/ - A new fast multispectral recognition system

NASA Technical Reports Server (NTRS)

Kriegler, F.; Marshall, R.; Lampert, S.; Gordon, M.; Cornell, C.; Kistler, R.

1973-01-01

The MIDAS system is a prototype, multiple-pipeline digital processor mechanizing the multivariate-Gaussian, maximum-likelihood decision algorithm operating at 200,000 pixels/second. It incorporates displays and film printer equipment under control of a general purpose midi-computer and possesses sufficient flexibility that operational versions of the equipment may be subsequently specified as subsets of the system.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS

PubMed Central

Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.

2014-01-01

Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
Enhancing data locality by using terminal propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Van Driessche, R.

1995-12-31

Terminal propagation is a method developed in the circuit placement community for adding constraints to graph partitioning problems. This paper adapts and expands this idea, and applies it to the problem of partitioning data structures among the processors of a parallel computer. We show how the constraints in terminal propagation can be used to encourage partitions in which messages are communicated only between architecturally near processors. We then show how these constraints can be handled in two important partitioning algorithms, spectral bisection and multilevel-KL. We compare the quality of partitions generated by these algorithms to each other and to Partitionsmore » generated by more familiar techniques.« less
Air-Lubricated Thermal Processor For Dry Silver Film

NASA Astrophysics Data System (ADS)

Siryj, B. W.

1980-09-01

Since dry silver film is processed by heat, it may be viewed on a light table only seconds after exposure. On the other hand, wet films require both bulky chemicals and substantial time before an image can be analyzed. Processing of dry silver film, although simple in concept, is not so simple when reduced to practice. The main concern is the effect of film temperature gradients on uniformity of optical film density. RCA has developed two thermal processors, different in implementation but based on the same philosophy. Pressurized air is directed to both sides of the film to support the film and to conduct the heat to the film. Porous graphite is used as the medium through which heat and air are introduced. The initial thermal processor was designed to process 9.5-inch-wide film moving at speeds ranging from 0.0034 to 0.008 inch per second. The processor configuration was curved to match the plane generated by the laser recording beam. The second thermal processor was configured to process 5-inch-wide film moving at a continuously variable rate ranging from 0.15 to 3.5 inches per second. Due to field flattening optics used in this laser recorder, the required film processing area was plane. In addition, this processor was sectioned in the direction of film motion, giving the processor the capability of varying both temperature and effective processing area.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Wallack, A.S.; Canny, J.

1996-08-13

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 27 figs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Canny, John; Wallack, Aaron S.

1999-01-01

Methods and apparatus are provided for developing a complete set of all admissible Type I and Type II fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Wallack, Aaron S.; Canny, John

1996-01-01

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Canny, J.; Wallack, A.S.

1999-01-05

Methods and apparatus are provided for developing a complete set of all admissible Type 1 and Type 2 fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 44 figs.

Fundamental physics issues of multilevel logic in developing a parallel processor.

NASA Astrophysics Data System (ADS)

Bandyopadhyay, Anirban; Miki, Kazushi

2007-06-01

In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.
Synthetic Aperture Radar (SAR) data processing

NASA Technical Reports Server (NTRS)

Beckner, F. L.; Ahr, H. A.; Ausherman, D. A.; Cutrona, L. J.; Francisco, S.; Harrison, R. E.; Heuser, J. S.; Jordan, R. L.; Justus, J.; Manning, B.

1978-01-01

The available and optimal methods for generating SAR imagery for NASA applications were identified. The SAR image quality and data processing requirements associated with these applications were studied. Mathematical operations and algorithms required to process sensor data into SAR imagery were defined. The architecture of SAR image formation processors was discussed, and technology necessary to implement the SAR data processors used in both general purpose and dedicated imaging systems was addressed.
Tactical Operations Analysis Support Facility.

DTIC Science & Technology

1983-07-01

are stored in nonvolatile RAM (NVR). Communication with a host processor via a UART (75-19.2K bps) in full duplex mode. An advanced video option...hardware/firmware "machines." Smart terminals, I/O con- * trollers, and unique peripheral processors are examples of this process. Briton Lee, Inc...the relational data base for symbol attributes and data retrievals. * Generates a grid system for precise cursor positioning for lines, charts, and
Method of developing all-optical trinary JK, D-type, and T-type flip-flops using semiconductor optical amplifiers.

PubMed

Garai, Sisir Kumar

2012-04-10

To meet the demand of very fast and agile optical networks, the optical processors in a network system should have a very fast execution rate, large information handling, and large information storage capacities. Multivalued logic operations and multistate optical flip-flops are the basic building blocks for such fast running optical computing and data processing systems. In the past two decades, many methods of implementing all-optical flip-flops have been proposed. Most of these suffer from speed limitations because of the low switching response of active devices. The frequency encoding technique has been used because of its many advantages. It can preserve its identity throughout data communication irrespective of loss of light energy due to reflection, refraction, attenuation, etc. The action of polarization-rotation-based very fast switching of semiconductor optical amplifiers increases processing speed. At the same time, tristate optical flip-flops increase information handling capacity.
Systems and Methods for Automated Vessel Navigation Using Sea State Prediction

NASA Technical Reports Server (NTRS)

Huntsberger, Terrance L. (Inventor); Howard, Andrew B. (Inventor); Reinhart, Rene Felix (Inventor); Aghazarian, Hrand (Inventor); Rankin, Arturo (Inventor)

2017-01-01

Systems and methods for sea state prediction and autonomous navigation in accordance with embodiments of the invention are disclosed. One embodiment of the invention includes a method of predicting a future sea state including generating a sequence of at least two 3D images of a sea surface using at least two image sensors, detecting peaks and troughs in the 3D images using a processor, identifying at least one wavefront in each 3D image based upon the detected peaks and troughs using the processor, characterizing at least one propagating wave based upon the propagation of wavefronts detected in the sequence of 3D images using the processor, and predicting a future sea state using at least one propagating wave characterizing the propagation of wavefronts in the sequence of 3D images using the processor. Another embodiment includes a method of autonomous vessel navigation based upon a predicted sea state and target location.
Systems and Methods for Automated Vessel Navigation Using Sea State Prediction

NASA Technical Reports Server (NTRS)

Aghazarian, Hrand (Inventor); Reinhart, Rene Felix (Inventor); Huntsberger, Terrance L. (Inventor); Rankin, Arturo (Inventor); Howard, Andrew B. (Inventor)

2015-01-01

Systems and methods for sea state prediction and autonomous navigation in accordance with embodiments of the invention are disclosed. One embodiment of the invention includes a method of predicting a future sea state including generating a sequence of at least two 3D images of a sea surface using at least two image sensors, detecting peaks and troughs in the 3D images using a processor, identifying at least one wavefront in each 3D image based upon the detected peaks and troughs using the processor, characterizing at least one propagating wave based upon the propagation of wavefronts detected in the sequence of 3D images using the processor, and predicting a future sea state using at least one propagating wave characterizing the propagation of wavefronts in the sequence of 3D images using the processor. Another embodiment includes a method of autonomous vessel navigation based upon a predicted sea state and target location.
Status of the International Space Station Regenerative ECLSS Water Recovery and Oxygen Generation Systems

NASA Technical Reports Server (NTRS)

Bagdigian, Robert M.; Cloud, Dale

2005-01-01

NASA is developing three racks containing regenerative water recovery and oxygen generation systems (WRS and OGS) for deployment on the International Space Station (ISS). The major assemblies included in these racks are the Water Processor Assembly (WPA), Urine Processor Assembly (UPA), Oxygen Generation Assembly (OGA), and the Power Supply Module (PSM) supporting the OGA. The WPA and OGA are provided by Hamilton Sundstrand Space Systems International (HSSSI), Inc., while the UPA and PSM are developed in- house by the Marshall Space Flight Center (MSFC). The assemblies have completed the manufacturing phase and are in various stages of testing and integration into the flight racks. This paper summarizes the status as of April 2005 and describes some of the technical challenges encountered and lessons learned over the past year.
Onboard Interferometric SAR Processor for the Ka-Band Radar Interferometer (KaRIn)

NASA Technical Reports Server (NTRS)

Esteban-Fernandez, Daniel; Rodriquez, Ernesto; Peral, Eva; Clark, Duane I.; Wu, Xiaoqing

2011-01-01

An interferometric synthetic aperture radar (SAR) onboard processor concept and algorithm has been developed for the Ka-band radar interferometer (KaRIn) instrument on the Surface and Ocean Topography (SWOT) mission. This is a mission- critical subsystem that will perform interferometric SAR processing and multi-look averaging over the oceans to decrease the data rate by three orders of magnitude, and therefore enable the downlink of the radar data to the ground. The onboard processor performs demodulation, range compression, coregistration, and re-sampling, and forms nine azimuth squinted beams. For each of them, an interferogram is generated, including common-band spectral filtering to improve correlation, followed by averaging to the final 1 1-km ground resolution pixel. The onboard processor has been prototyped on a custom FPGA-based cPCI board, which will be part of the radar s digital subsystem. The level of complexity of this technology, dictated by the implementation of interferometric SAR processing at high resolution, the extremely tight level of accuracy required, and its implementation on FPGAs are unprecedented at the time of this reporting for an onboard processor for flight applications.
Design of an integrated fuel processor for residential PEMFCs applications

NASA Astrophysics Data System (ADS)

Seo, Yu Taek; Seo, Dong Joo; Jeong, Jin Hyeok; Yoon, Wang Lai

KIER has been developing a novel fuel processing system to provide hydrogen rich gas to residential PEMFCs system. For the effective design of a compact hydrogen production system, each unit process for steam reforming and water gas shift, has a steam generator and internal heat exchangers which are thermally and physically integrated into a single packaged hardware system. The newly designed fuel processor (prototype II) showed a thermal efficiency of 78% as a HHV basis with methane conversion of 89%. The preferential oxidation unit with two staged cascade reactors, reduces, the CO concentration to below 10 ppm without complicated temperature control hardware, which is the prerequisite CO limit for the PEMFC stack. After we achieve the initial performance of the fuel processor, partial load operation was carried out to test the performance and reliability of the fuel processor at various loads. The stability of the fuel processor was also demonstrated for three successive days with a stable composition of product gas and thermal efficiency. The CO concentration remained below 10 ppm during the test period and confirmed the stable performance of the two-stage PrOx reactors.
Messiah College Biodiesel Fuel Generation Project Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zummo, Michael M; Munson, J; Derr, A

Many obvious and significant concerns arise when considering the concept of small-scale biodiesel production. Does the fuel produced meet the stringent requirements set by the commercial biodiesel industry? Is the process safe? How are small-scale producers collecting and transporting waste vegetable oil? How is waste from the biodiesel production process handled by small-scale producers? These concerns and many others were the focus of the research preformed in the Messiah College Biodiesel Fuel Generation project over the last three years. This project was a unique research program in which undergraduate engineering students at Messiah College set out to research the feasibilitymore » of small-biodiesel production for application on a campus of approximately 3000 students. This Department of Energy (DOE) funded research program developed out of almost a decade of small-scale biodiesel research and development work performed by students at Messiah College. Over the course of the last three years the research team focused on four key areas related to small-scale biodiesel production: Quality Testing and Assurance, Process and Processor Research, Process and Processor Development, and Community Education. The objectives for the Messiah College Biodiesel Fuel Generation Project included the following: 1. Preparing a laboratory facility for the development and optimization of processors and processes, ASTM quality assurance, and performance testing of biodiesel fuels. 2. Developing scalable processor and process designs suitable for ASTM certifiable small-scale biodiesel production, with the goals of cost reduction and increased quality. 3. Conduct research into biodiesel process improvement and cost optimization using various biodiesel feedstocks and production ingredients.« less
Autothermal and partial oxidation reformer-based fuel processor, method for improving catalyst function in autothermal and partial oxidation reformer-based processors

DOEpatents

Ahmed, Shabbir; Papadias, Dionissios D.; Lee, Sheldon H. D.; Ahluwalia, Rajesh K.

2013-01-08

The invention provides a fuel processor comprising a linear flow structure having an upstream portion and a downstream portion; a first catalyst supported at the upstream portion; and a second catalyst supported at the downstream portion, wherein the first catalyst is in fluid communication with the second catalyst. Also provided is a method for reforming fuel, the method comprising contacting the fuel to an oxidation catalyst so as to partially oxidize the fuel and generate heat; warming incoming fuel with the heat while simultaneously warming a reforming catalyst with the heat; and reacting the partially oxidized fuel with steam using the reforming catalyst.
C-MOS array design techniques: SUMC multiprocessor system study

NASA Technical Reports Server (NTRS)

Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

1972-01-01

The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Systems and methods for reconfiguring input devices

NASA Technical Reports Server (NTRS)

Lancaster, Jeff (Inventor); De Mers, Robert E. (Inventor)

2012-01-01

A system includes an input device having first and second input members configured to be activated by a user. The input device is configured to generate activation signals associated with activation of the first and second input members, and each of the first and second input members are associated with an input function. A processor is coupled to the input device and configured to receive the activation signals. A memory coupled to the processor, and includes a reconfiguration module configured to store the input functions assigned to the first and second input members and, upon execution of the processor, to reconfigure the input functions assigned to the input members when the first input member is inoperable.
Radiation-Hardened Electronics for Advanced Communications Systems

NASA Technical Reports Server (NTRS)

Whitaker, Sterling

2015-01-01

Novel approach enables high-speed special-purpose processors Advanced reconfigurable and reprogrammable communication systems will require sub-130-nanometer electronics. Legacy single event upset (SEU) radiation-tolerant circuits are ineffective at speeds greater than 125 megahertz. In Phase I of this project, ICs, LLC, demonstrated new base-level logic circuits that provide SEU immunity for sub-130-nanometer high-speed circuits. In Phase II, the company developed an innovative self-restoring logic (SRL) circuit and a system approach that provides high-speed, SEU-tolerant solutions that are effective for sub-130-nanometer electronics scalable to at least 22-nanometer processes. The SRL system can be used in the design of NASA's next-generation special-purpose processors, especially reconfigurable communication processors.
Accuracy-energy configurable sensor processor and IoT device for long-term activity monitoring in rare-event sensing applications.

PubMed

Park, Daejin; Cho, Jeonghun

2014-01-01

A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error.
A general natural-language text processor for clinical radiology.

PubMed Central

Friedman, C; Alderson, P O; Austin, J H; Cimino, J J; Johnson, S B

1994-01-01

OBJECTIVE: Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. DESIGN: The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. MEASUREMENTS: The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. RESULTS: Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision. PMID:7719797
Demonstration of two-qubit algorithms with a superconducting quantum processor.

PubMed

DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J

2009-07-09

Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.
Right-Brain/Left-Brain Integrated Associative Processor Employing Convertible Multiple-Instruction-Stream Multiple-Data-Stream Elements

NASA Astrophysics Data System (ADS)

Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi

2005-04-01

A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
Image Display and Manipulation System (IDAMS) program documentation, Appendixes A-D. [including routines, convolution filtering, image expansion, and fast Fourier transformation

NASA Technical Reports Server (NTRS)

Cecil, R. W.; White, R. A.; Szczur, M. R.

1972-01-01

The IDAMS Processor is a package of task routines and support software that performs convolution filtering, image expansion, fast Fourier transformation, and other operations on a digital image tape. A unique task control card for that program, together with any necessary parameter cards, selects each processing technique to be applied to the input image. A variable number of tasks can be selected for execution by including the proper task and parameter cards in the input deck. An executive maintains control of the run; it initiates execution of each task in turn and handles any necessary error processing.
75 FR 12575 - Agency Information Collection Activities: Proposed Collection; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2010-03-16

... licensed pursuant to 10 CFR Part 61 or equivalent Agreement State regulations. All generators, collectors... processors, contains information which facilitates tracking the identity of the waste generator. That... generators. The information provided on NRC Form 542 permits the States and Compacts to know the original...

Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Christopher H.; Long, Hai; Sides, Scott

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
Fast Pixel Buffer For Processing With Lookup Tables

NASA Technical Reports Server (NTRS)

Fisher, Timothy E.

1992-01-01

Proposed scheme for buffering data on intensities of picture elements (pixels) of image increases rate or processing beyond that attainable when data read, one pixel at time, from main image memory. Scheme applied in design of specialized image-processing circuitry. Intended to optimize performance of processor in which electronic equivalent of address-lookup table used to address those pixels in main image memory required for processing.
RASSP signal processing architectures

NASA Astrophysics Data System (ADS)

Shirley, Fred; Bassett, Bob; Letellier, J. P.

1995-06-01

The rapid prototyping of application specific signal processors (RASSP) program is an ARPA/tri-service effort to dramatically improve the process by which complex digital systems, particularly embedded signal processors, are specified, designed, documented, manufactured, and supported. The domain of embedded signal processing was chosen because it is important to a variety of military and commercial applications as well as for the challenge it presents in terms of complexity and performance demands. The principal effort is being performed by two major contractors, Lockheed Sanders (Nashua, NH) and Martin Marietta (Camden, NJ). For both, improvements in methodology are to be exercised and refined through the performance of individual 'Demonstration' efforts. The Lockheed Sanders' Demonstration effort is to develop an infrared search and track (IRST) processor. In addition, both contractors' results are being measured by a series of externally administered (by Lincoln Labs) six-month Benchmark programs that measure process improvement as a function of time. The first two Benchmark programs are designing and implementing a synthetic aperture radar (SAR) processor. Our demonstration team is using commercially available VME modules from Mercury Computer to assemble a multiprocessor system scalable from one to hundreds of Intel i860 microprocessors. Custom modules for the sensor interface and display driver are also being developed. This system implements either proprietary or Navy owned algorithms to perform the compute-intensive IRST function in real time in an avionics environment. Our Benchmark team is designing custom modules using commercially available processor ship sets, communication submodules, and reconfigurable logic devices. One of the modules contains multiple vector processors optimized for fast Fourier transform processing. Another module is a fiberoptic interface that accepts high-rate input data from the sensors and provides video-rate output data to a display. This paper discusses the impact of simulation on choosing signal processing algorithms and architectures, drawing from the experiences of the Demonstration and Benchmark inter-company teams at Lockhhed Sanders, Motorola, Hughes, and ISX.
Efficient storage, computation, and exposure of computer-generated holograms by electron-beam lithography.

PubMed

Newman, D M; Hawley, R W; Goeckel, D L; Crawford, R D; Abraham, S; Gallagher, N C

1993-05-10

An efficient storage format was developed for computer-generated holograms for use in electron-beam lithography. This method employs run-length encoding and Lempel-Ziv-Welch compression and succeeds in exposing holograms that were previously infeasible owing to the hologram's tremendous pattern-data file size. These holograms also require significant computation; thus the algorithm was implemented on a parallel computer, which improved performance by 2 orders of magnitude. The decompression algorithm was integrated into the Cambridge electron-beam machine's front-end processor.Although this provides much-needed ability, some hardware enhancements will be required in the future to overcome inadequacies in the current front-end processor that result in a lengthy exposure time.
Reconfigurable Drive Current System

NASA Technical Reports Server (NTRS)

Alhorn, Dean C. (Inventor); Dutton, Kenneth R. (Inventor); Howard, David E. (Inventor); Smith, Dennis A. (Inventor)

2017-01-01

A reconfigurable drive current system includes drive stages, each of which includes a high-side transistor and a low-side transistor in a totem pole configuration. A current monitor is coupled to an output of each drive stage. Input channels are provided to receive input signals. A processor is coupled to the input channels and to each current monitor for generating at least one drive signal using at least one of the input signals and current measured by at least one of the current monitors. A pulse width modulation generator is coupled to the processor and each drive stage for varying the drive signals as a function of time prior to being supplied to at least one of the drive stages.
Clinical Validation of a Sound Processor Upgrade in Direct Acoustic Cochlear Implant Subjects

PubMed Central

Kludt, Eugen; D’hondt, Christiane; Lenarz, Thomas; Maier, Hannes

2017-01-01

Objective: The objectives of the investigation were to evaluate the effect of a sound processor upgrade on the speech reception threshold in noise and to collect long-term safety and efficacy data after 2½ to 5 years of device use of direct acoustic cochlear implant (DACI) recipients. Study Design: The study was designed as a mono-centric, prospective clinical trial. Setting: Tertiary referral center. Patients: Fifteen patients implanted with a direct acoustic cochlear implant. Intervention: Upgrade with a newer generation of sound processor. Main Outcome Measures: Speech recognition test in quiet and in noise, pure tone thresholds, subject-reported outcome measures. Results: The speech recognition in quiet and in noise is superior after the sound processor upgrade and stable after long-term use of the direct acoustic cochlear implant. The bone conduction thresholds did not decrease significantly after long-term high level stimulation. Conclusions: The new sound processor for the DACI system provides significant benefits for DACI users for speech recognition in both quiet and noise. Especially the noise program with the use of directional microphones (Zoom) allows DACI patients to have much less difficulty when having conversations in noisy environments. Furthermore, the study confirms that the benefits of the sound processor upgrade are available to the DACI recipients even after several years of experience with a legacy sound processor. Finally, our study demonstrates that the DACI system is a safe and effective long-term therapy. PMID:28406848
Parallel architecture for rapid image generation and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nerheim, R.J.

1987-01-01

A multiprocessor architecture inspired by the Disney multiplane camera is proposed. For many applications, this approach produces a natural mapping of processors to objects in a scene. Such a mapping promotes parallelism and reduces the hidden-surface work with minimal interprocessor communication and low-overhead cost. Existing graphics architectures store the final picture as a monolithic entity. The architecture here stores each object's image separately. It assembles the final composite picture from component images only when the video display needs to be refreshed. This organization simplifies the work required to animate moving objects that occlude other objects. In addition, the architecture hasmore » multiple processors that generate the component images in parallel. This further shortens the time needed to create a composite picture. In addition to generating images for animation, the architecture has the ability to decompose images.« less
Status of the Node 3 Regenerative Environmental Cpntrol& Life Support System Water Recovery & Oxygen Generation Systems

NASA Technical Reports Server (NTRS)

Carrasquillo, Robyn L.

2003-01-01

NASA s Marshall Space Flight Center is providing three racks containing regenerative water recovery and oxygen generation systems (WRS and OGS) for flight on the lnternational Space Station s (ISS) Node 3 element. The major assemblies included in these racks are the Water Processor Assembly (WPA), Urine Processor Assembly (UPA), Oxygen Generation Assembly (OGA), and the Power Supply Module (PSM) supporting the OGA. The WPA and OGA are provided by Hamilton Sundstrand Space Systems lnternational (HSSSI), while the UPA and PSM are being designed and manufactured in-house by MSFC. The assemblies are currently in the manufacturing and test phase and are to be completed and integrated into flight racks this year. This paper gives an overview of the technologies and system designs, technical challenges encountered and solved, and the current status.
General-purpose interface bus for multiuser, multitasking computer system

NASA Technical Reports Server (NTRS)

Generazio, Edward R.; Roth, Don J.; Stang, David B.

1990-01-01

The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.
Combustor air flow control method for fuel cell apparatus

DOEpatents

Clingerman, Bruce J.; Mowery, Kenneth D.; Ripley, Eugene V.

2001-01-01

A method for controlling the heat output of a combustor in a fuel cell apparatus to a fuel processor where the combustor has dual air inlet streams including atmospheric air and fuel cell cathode effluent containing oxygen depleted air. In all operating modes, an enthalpy balance is provided by regulating the quantity of the air flow stream to the combustor to support fuel cell processor heat requirements. A control provides a quick fast forward change in an air valve orifice cross section in response to a calculated predetermined air flow, the molar constituents of the air stream to the combustor, the pressure drop across the air valve, and a look up table of the orifice cross sectional area and valve steps. A feedback loop fine tunes any error between the measured air flow to the combustor and the predetermined air flow.
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
IGA-ADS: Isogeometric analysis FEM using ADS solver

NASA Astrophysics Data System (ADS)

Łoś, Marcin M.; Woźniak, Maciej; Paszyński, Maciej; Lenharth, Andrew; Hassaan, Muhamm Amber; Pingali, Keshav

2017-08-01

In this paper we present a fast explicit solver for solution of non-stationary problems using L2 projections with isogeometric finite element method. The solver has been implemented within GALOIS framework. It enables parallel multi-core simulations of different time-dependent problems, in 1D, 2D, or 3D. We have prepared the solver framework in a way that enables direct implementation of the selected PDE and corresponding boundary conditions. In this paper we describe the installation, implementation of exemplary three PDEs, and execution of the simulations on multi-core Linux cluster nodes. We consider three case studies, including heat transfer, linear elasticity, as well as non-linear flow in heterogeneous media. The presented package generates output suitable for interfacing with Gnuplot and ParaView visualization software. The exemplary simulations show near perfect scalability on Gilbert shared-memory node with four Intel® Xeon® CPU E7-4860 processors, each possessing 10 physical cores (for a total of 40 cores).
Modern design of a fast front-end computer

NASA Astrophysics Data System (ADS)

Šoštarić, Z.; Anic̈ić, D.; Sekolec, L.; Su, J.

1994-12-01

Front-end computers (FEC) at Paul Scherrer Institut provide access to accelerator CAMAC-based sensors and actuators by way of a local area network. In the scope of the new generation FEC project, a front-end is regarded as a collection of services. The functionality of one such service is described in terms of Yourdon's environment, behaviour, processor and task models. The computational model (software representation of the environment) of the service is defined separately, using the information model of the Shlaer-Mellor method, and Sather OO language. In parallel with the analysis and later with the design, a suite of test programmes was developed to evaluate the feasibility of different computing platforms for the project and a set of rapid prototypes was produced to resolve different implementation issues. The past and future aspects of the project and its driving forces are presented. Justification of the choice of methodology, platform and requirement, is given. We conclude with a description of the present state, priorities and limitations of our project.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

DOE PAGES

Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

1995-01-01

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Application of a VLSI vector quantization processor to real-time speech coding

NASA Technical Reports Server (NTRS)

Davidson, G.; Gersho, A.

1986-01-01

Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
Magnetic Bubble Memories for Data Collection in Sounding Rockets,

DTIC Science & Technology

1982-01-29

generate interest in bubbles as a mass storage device for micro - processor based equipment, manufacturers have come up with a variety of diversified...absence of a bubble represents a Ŕ". With diameters on the order of I to 5 micro -meters, these bubbles are so small that extremely tiny chips can hold...methods of transfer: polled I/O, interrupt driven I/O, and direct memory access (DMA). The first two methods require tho host processor be involved
DOE Office of Scientific and Technical Information (OSTI.GOV)

Krityakierne, Tipaluck; Akhtar, Taimoor; Shoemaker, Christine A.

This paper presents a parallel surrogate-based global optimization method for computationally expensive objective functions that is more effective for larger numbers of processors. To reach this goal, we integrated concepts from multi-objective optimization and tabu search into, single objective, surrogate optimization. Our proposed derivative-free algorithm, called SOP, uses non-dominated sorting of points for which the expensive function has been previously evaluated. The two objectives are the expensive function value of the point and the minimum distance of the point to previously evaluated points. Based on the results of non-dominated sorting, P points from the sorted fronts are selected as centersmore » from which many candidate points are generated by random perturbations. Based on surrogate approximation, the best candidate point is subsequently selected for expensive evaluation for each of the P centers, with simultaneous computation on P processors. Centers that previously did not generate good solutions are tabu with a given tenure. We show almost sure convergence of this algorithm under some conditions. The performance of SOP is compared with two RBF based methods. The test results show that SOP is an efficient method that can reduce time required to find a good near optimal solution. In a number of cases the efficiency of SOP is so good that SOP with 8 processors found an accurate answer in less wall-clock time than the other algorithms did with 32 processors.« less
New developments for SAW channelization for mobile satellite payloads

NASA Technical Reports Server (NTRS)

Peach, R. C.; Mabson, P.

1995-01-01

The use of SAW technology in mobile communication payloads is becoming widely accepted by the industry since being pioneered by Inmarsat for its third generation of satellites. This paper presents new developments in this area, including broadband processors of the Inmarsat 3 type, and the use of SAW filters at L-band. It is demonstrated that SAW processors have considerable potential for increasing the capacity of future communications payloads, while allowing fully transparent operation without any restriction on traffic type or modulation format. In addition to the evolutionary development of Inmarsat type processors, new SAW applications have also emerged recently. Therefore, despite the rapid changes in the industry, it is predicted that SAW processing has a strong future in satellite communications.
Developing software to use parallel processing effectively. Final report, June-December 1987

DOE Office of Scientific and Technical Information (OSTI.GOV)

Center, J.

1988-10-01

This report describes the difficulties involved in writing efficient parallel programs and describes the hardware and software support currently available for generating software that utilizes processing effectively. Historically, the processing rate of single-processor computers has increased by one order of magnitude every five years. However, this pace is slowing since electronic circuitry is coming up against physical barriers. Unfortunately, the complexity of engineering and research problems continues to require ever more processing power (far in excess of the maximum estimated 3 Gflops achievable by single-processor computers). For this reason, parallel-processing architectures are receiving considerable interest, since they offer high performancemore » more cheaply than a single-processor supercomputer, such as the Cray.« less
The role of neuroimaging in the discovery of processing stages. A review.

PubMed

Mulder, G; Wijers, A A; Lange, J J; Buijink, B M; Mulder, L J; Willemsen, A T; Paans, A M

1995-11-01

In this contribution we show how neuroimaging methods can augment behavioural methods to discover processing stages. Event Related Brain Potentials (ERPs), Brain Electrical Source Analysis (BESA) and regional changes in cerebral blood flow (rCBF) do not necessarily require behavioural responses. With the aid of rCBF we are able to discover several cortical and subcortical brain systems (processors) active in selective attention and memory search tasks. BESA describes cortical activity with high temporal resolution in terms of a limited number of neural generators within these brain systems. The combination of behavioural methods and neuroimaging provides a picture of the functional architecture of the brain. The review is organized around three processors: the Visual, Cognitive and Manual Motor Processors.

FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Cognitive and neural foundations of discrete sequence skill: a TMS study.

PubMed

Ruitenberg, Marit F L; Verwey, Willem B; Schutter, Dennis J L G; Abrahamse, Elger L

2014-04-01

Executing discrete movement sequences typically involves a shift with practice from a relatively slow, stimulus-based mode to a fast mode in which performance is based on retrieving and executing entire motor chunks. The dual processor model explains the performance of (skilled) discrete key-press sequences in terms of an interplay between a cognitive processor and a motor system. In the present study, we tested and confirmed the core assumptions of this model at the behavioral level. In addition, we explored the involvement of the pre-supplementary motor area (pre-SMA) in discrete sequence skill by applying inhibitory 20 min 1-Hz off-line repetitive transcranial magnetic stimulation (rTMS). Based on previous work, we predicted pre-SMA involvement in the selection/initiation of motor chunks, and this was confirmed by our results. The pre-SMA was further observed to be more involved in more complex than in simpler sequences, while no evidence was found for pre-SMA involvement in direct stimulus-response translations or associative learning processes. In conclusion, support is provided for the dual processor model, and for pre-SMA involvement in the initiation of motor chunks. Copyright © 2014 Elsevier Ltd. All rights reserved.
Design and test of a regenerative satellite transmultiplexer

NASA Astrophysics Data System (ADS)

Hung, Kenny King-Ming

1993-05-01

In a multiple access scheme for regenerative satellite communications, the bulk frequency division multiple access (FDMA) uplink signal is demodulated on board the satellite and then remodulated for time division multiplexing (TDM) downlink transmission. Conversion from frequency to time division multiplex format requires that the uplink signal be frequency demultiplexed and each individual carrier be subsequently demodulated. For thin-route application which consists of a large number of channels with fixed data rate, multicarrier demodulation can be accomplished efficiently by a digital transmultiplexer (TMUX) using a fast Fourier transform processor followed by a bank of per-channel processors. A time domain description of the TMUX algorithm is derived which elucidates how the TMUX functions. The per-channel processor performs timing and carrier recovery for optimum and coherent data detection. Timing recovery is necessarily achieved asynchronously by a filter coefficient interpolation. Carrier recovery is performed using an all-digital phase-locked loop. The combination of both timing and carrier loops is investigated for a multi-user system. The performance of the overall system is assessed over a multi-user, additive white Gaussian noise channel for a bit energy to noise power spectral density ratio down to zero dB.
Early MIMD experience on the CRAY X-MP

NASA Astrophysics Data System (ADS)

Rhoades, Clifford E.; Stevens, K. G.

1985-07-01

This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
Image matrix processor for fast multi-dimensional computations

DOEpatents

Roberson, George P.; Skeate, Michael F.

1996-01-01

An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
Real-Time Data Processing in the muon system of the D0 detector.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neeti Parashar et al.

2001-07-03

This paper presents a real-time application of the 16-bit fixed point Digital Signal Processors (DSPs), in the Muon System of the D0 detector located at the Fermilab Tevatron, presently the world's highest-energy hadron collider. As part of the Upgrade for a run beginning in the year 2000, the system is required to process data at an input event rate of 10 KHz without incurring significant deadtime in readout. The ADSP21csp01 processor has high I/O bandwidth, single cycle instruction execution and fast task switching support to provide efficient multisignal processing. The processor's internal memory consists of 4K words of Program Memorymore » and 4K words of Data Memory. In addition there is an external memory of 32K words for general event buffering and 16K words of Dual port Memory for input data queuing. This DSP fulfills the requirement of the Muon subdetector systems for data readout. All error handling, buffering, formatting and transferring of the data to the various trigger levels of the data acquisition system is done in software. The algorithms developed for the system complete these tasks in about 20 {micro}s per event.« less
Tunable inter-qubit coupling as a resource for gate based quantum computing with superconducting circuits

NASA Astrophysics Data System (ADS)

Chiaro, B.; Neill, C.; Chen, Z.; Dunsworth, A.; Foxen, B.; Quintana, C.; Wenner, J.; Martinis, J. M.; Google Quantum Hardware Team

Fast, high fidelity two qubit gates are an essential requirement of a quantum processor. In this talk, we discuss how the tunable coupling of the gmon architecture provides a pathway for an improved two qubit controlled-Z gate. The maximum inter-qubit coupling strength gmax = 60 MHz is sufficient for fast adiabatic two qubit gates to be performed as quickly as single qubit gates, reducing dephasing errors. Additionally, the ability to turn the coupling off allows all qubits to idle at low magnetic flux sensitivity, further reducing susceptibility to noise. However, the flexibility that this platform offers comes at the expense of increased control complexity. We describe our strategy for addressing the control challenges of the gmon architecture and show experimental progress toward fast, high fidelity controlled-Z gates with gmon qubits.
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Fuel processor and method for generating hydrogen for fuel cells

DOEpatents

Ahmed, Shabbir [Naperville, IL; Lee, Sheldon H. D. [Willowbrook, IL; Carter, John David [Bolingbrook, IL; Krumpelt, Michael [Naperville, IL; Myers, Deborah J [Lisle, IL

2009-07-21

A method of producing a H.sub.2 rich gas stream includes supplying an O.sub.2 rich gas, steam, and fuel to an inner reforming zone of a fuel processor that includes a partial oxidation catalyst and a steam reforming catalyst or a combined partial oxidation and stream reforming catalyst. The method also includes contacting the O.sub.2 rich gas, steam, and fuel with the partial oxidation catalyst and the steam reforming catalyst or the combined partial oxidation and stream reforming catalyst in the inner reforming zone to generate a hot reformate stream. The method still further includes cooling the hot reformate stream in a cooling zone to produce a cooled reformate stream. Additionally, the method includes removing sulfur-containing compounds from the cooled reformate stream by contacting the cooled reformate stream with a sulfur removal agent. The method still further includes contacting the cooled reformate stream with a catalyst that converts water and carbon monoxide to carbon dioxide and H.sub.2 in a water-gas-shift zone to produce a final reformate stream in the fuel processor.
Configuring a fuel cell based residential combined heat and power system

NASA Astrophysics Data System (ADS)

Ahmed, Shabbir; Papadias, Dionissios D.; Ahluwalia, Rajesh K.

2013-11-01

The design and performance of a fuel cell based residential combined heat and power (CHP) system operating on natural gas has been analyzed. The natural gas is first converted to a hydrogen-rich reformate in a steam reformer based fuel processor, and the hydrogen is then electrochemically oxidized in a low temperature polymer electrolyte fuel cell to generate electric power. The heat generated in the fuel cell and the available heat in the exhaust gas is recovered to meet residential needs for hot water and space heating. Two fuel processor configurations have been studied. One of the configurations was explored to quantify the effects of design and operating parameters, which include pressure, temperature, and steam-to-carbon ratio in the fuel processor, and fuel utilization in the fuel cell. The second configuration applied the lessons from the study of the first configuration to increase the CHP efficiency. Results from the two configurations allow a quantitative comparison of the design alternatives. The analyses showed that these systems can operate at electrical efficiencies of ∼46% and combined heat and power efficiencies of ∼90%.
Numerical aerodynamic simulation facility preliminary study, volume 2 and appendices

NASA Technical Reports Server (NTRS)

1977-01-01

Data to support results obtained in technology assessment studies are presented. Objectives, starting points, and future study tasks are outlined. Key design issues discussed in appendices include: data allocation, transposition network design, fault tolerance and trustworthiness, logic design, processing element of existing components, number of processors, the host system, alternate data base memory designs, number representation, fast div 521 instruction, architectures, and lockstep array versus synchronizable array machine comparison.
Combustor Simulation

NASA Technical Reports Server (NTRS)

Norris, Andrew

2003-01-01

The goal was to perform 3D simulation of GE90 combustor, as part of full turbofan engine simulation. Requirements of high fidelity as well as fast turn-around time require massively parallel code. National Combustion Code (NCC) was chosen for this task as supports up to 999 processors and includes state-of-the-art combustion models. Also required is ability to take inlet conditions from compressor code and give exit conditions to turbine code.
Scalable Multiprocessor for High-Speed Computing in Space

NASA Technical Reports Server (NTRS)

Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard

2004-01-01

A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
Replication of Space-Shuttle Computers in FPGAs and ASICs

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.

2008-01-01

A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Development of compact fuel processor for 2 kW class residential PEMFCs

NASA Astrophysics Data System (ADS)

Seo, Yu Taek; Seo, Dong Joo; Jeong, Jin Hyeok; Yoon, Wang Lai

Korea Institute of Energy Research (KIER) has been developing a novel fuel processing system to provide hydrogen rich gas to residential polymer electrolyte membrane fuel cells (PEMFCs) cogeneration system. For the effective design of a compact hydrogen production system, the unit processes of steam reforming, high and low temperature water gas shift, steam generator and internal heat exchangers are thermally and physically integrated into a packaged hardware system. Several prototypes are under development and the prototype I fuel processor showed thermal efficiency of 73% as a HHV basis with methane conversion of 81%. Recently tested prototype II has been shown the improved performance of thermal efficiency of 76% with methane conversion of 83%. In both prototypes, two-stage PrOx reactors reduce CO concentration less than 10 ppm, which is the prerequisite CO limit condition of product gas for the PEMFCs stack. After confirming the initial performance of prototype I fuel processor, it is coupled with PEMFC single cell to test the durability and demonstrated that the fuel processor is operated for 3 days successfully without any failure of fuel cell voltage. Prototype II fuel processor also showed stable performance during the durability test.
Assessment of directionality performances: comparison between Freedom and CP810 sound processors.

PubMed

Razza, Sergio; Albanese, Greta; Ermoli, Lucilla; Zaccone, Monica; Cristofari, Eliana

2013-10-01

To compare speech recognition in noise for the Nucleus Freedom and CP810 sound processors using different directional settings among those available in the SmartSound portfolio. Single-subject, repeated measures study. Tertiary care referral center. Thirty-one monoaurally and binaurally implanted subjects (24 children and 7 adults) were enrolled. They were all experienced Nucleus Freedom sound processor users and achieved a 100% open set word recognition score in quiet listening conditions. Each patient was fitted with the Freedom and the CP810 processor. The program setting incorporated Adaptive Dynamic Range Optimization (ADRO) and adopted the directional algorithm BEAM (both devices) and ZOOM (only on CP810). Speech reception threshold (SRT) was assessed in a free-field layout, with disyllabic word list and interfering multilevel babble noise in the 3 different pre-processing configurations. On average, CP810 improved significantly patients' SRTs as compared to Freedom SP after 1 hour of use. Instead, no significant difference was observed in patients' SRT between the BEAM and the ZOOM algorithm fitted in the CP810 processor. The results suggest that hardware developments achieved in the design of CP810 allow an immediate and relevant directional advantage as compared to the previous-generation Freedom device.
A miniaturized glucose biosensor for in vitro and in vivo studies.

PubMed

Yang, Yang-Li; Huang, Jian-Feng; Tseng, Ta-Feng; Lin, Chia-Ching; Lou, Shyh-Liang

2008-01-01

A miniaturized wireless glucose biosensor has been developed to perform in vitro and in vivo studies. It consists of an external control subsystem and an implant sensing subsystem. The implant subsystem consists of a micro-processor, which coordinates circuitries of radio frequency, power regulator, command demodulator, glucose sensing trigger and signal read-out. Except for a set of sensing electrodes, the micro-processor, the circuitries and a receiving coil were hermetically sealed with polydimethylsiloxane. The electrode set is a substrate of silicon oxide coated with platinum, which includes a working electrode and a reference electrode. Glucose oxidase was immobilized on the surface of the working electrode. The implant subsystem bi-directionally communicates with the external subsystem via radio frequency technologies. The external subsystem wirelessly supplies electricity to power the implant, issues commands to the implant to perform tasks, receives the glucose responses detected by the electrode, and relays the response signals to a computer through a RS-232 connection. Studies of in vitro and in vivo were performed to evaluate the biosensor. The linear response of the biosensor is up to 15 mM of glucose in vitro. The results of in vivo study show significant glucose variations measured from the interstitial tissue fluid of a diabetes rat in fasting and non-fasting periods.
A web-based institutional DICOM distribution system with the integration of the Clinical Trial Processor (CTP).

PubMed

Aryanto, K Y E; Broekema, A; Langenhuysen, R G A; Oudkerk, M; van Ooijen, P M A

2015-05-01

To develop and test a fast and easy rule-based web-environment with optional de-identification of imaging data to facilitate data distribution within a hospital environment. A web interface was built using Hypertext Preprocessor (PHP), an open source scripting language for web development, and Java with SQL Server to handle the database. The system allows for the selection of patient data and for de-identifying these when necessary. Using the services provided by the RSNA Clinical Trial Processor (CTP), the selected images were pushed to the appropriate services using a protocol based on the module created for the associated task. Five pipelines, each performing a different task, were set up in the server. In a 75 month period, more than 2,000,000 images are transferred and de-identified in a proper manner while 20,000,000 images are moved from one node to another without de-identification. While maintaining a high level of security and stability, the proposed system is easy to setup, it integrate well with our clinical and research practice and it provides a fast and accurate vendor-neutral process of transferring, de-identifying, and storing DICOM images. Its ability to run different de-identification processes in parallel pipelines is a major advantage in both clinical and research setting.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1989-01-01

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Fast decision algorithms in low-power embedded processors for quality-of-service based connectivity of mobile sensors in heterogeneous wireless sensor networks.

PubMed

Jaraíz-Simón, María D; Gómez-Pulido, Juan A; Vega-Rodríguez, Miguel A; Sánchez-Pérez, Juan M

2012-01-01

When a mobile wireless sensor is moving along heterogeneous wireless sensor networks, it can be under the coverage of more than one network many times. In these situations, the Vertical Handoff process can happen, where the mobile sensor decides to change its connection from a network to the best network among the available ones according to their quality of service characteristics. A fitness function is used for the handoff decision, being desirable to minimize it. This is an optimization problem which consists of the adjustment of a set of weights for the quality of service. Solving this problem efficiently is relevant to heterogeneous wireless sensor networks in many advanced applications. Numerous works can be found in the literature dealing with the vertical handoff decision, although they all suffer from the same shortfall: a non-comparable efficiency. Therefore, the aim of this work is twofold: first, to develop a fast decision algorithm that explores the entire space of possible combinations of weights, searching that one that minimizes the fitness function; and second, to design and implement a system on chip architecture based on reconfigurable hardware and embedded processors to achieve several goals necessary for competitive mobile terminals: good performance, low power consumption, low economic cost, and small area integration.

A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*

PubMed Central

Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.

2012-01-01

This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200
Validity of the iPhone M7 motion co-processor as a pedometer for able-bodied ambulation.

PubMed

Major, Matthew J; Alford, Micah

2016-12-01

Physical activity benefits for disease prevention are well-established. Smartphones offer a convenient platform for community-based step count estimation to monitor and encourage physical activity. Accuracy is dependent on hardware-software platforms, creating a recurring challenge for validation, but the Apple iPhone® M7 motion co-processor provides a standardised method that helps address this issue. Validity of the M7 to record step count for level-ground, able-bodied walking at three self-selected speeds, and agreement with the StepWatch TM was assessed. Steps were measured concurrently with the iPhone® (custom application to extract step count), StepWatch TM and manual count. Agreement between iPhone® and manual/StepWatch TM count was estimated through Pearson correlation and Bland-Altman analyses. Data from 20 participants suggested that iPhone® step count correlations with manual and StepWatch TM were strong for customary (1.3 ± 0.1 m/s) and fast (1.8 ± 0.2 m/s) speeds, but weak for the slow (1.0 ± 0.1 m/s) speed. Mean absolute error (manual-iPhone®) was 21%, 8% and 4% for the slow, customary and fast speeds, respectively. The M7 accurately records step count during customary and fast walking speeds, but is prone to considerable inaccuracies at slow speeds which has important implications for certain patient groups. The iPhone® may be a suitable alternative to the StepWatch TM for only faster walking speeds.
Evaluation of reinitialization-free nonvolatile computer systems for energy-harvesting Internet of things applications

NASA Astrophysics Data System (ADS)

Onizawa, Naoya; Tamakoshi, Akira; Hanyu, Takahiro

2017-08-01

In this paper, reinitialization-free nonvolatile computer systems are designed and evaluated for energy-harvesting Internet of things (IoT) applications. In energy-harvesting applications, as power supplies generated from renewable power sources cause frequent power failures, data processed need to be backed up when power failures occur. Unless data are safely backed up before power supplies diminish, reinitialization processes are required when power supplies are recovered, which results in low energy efficiencies and slow operations. Using nonvolatile devices in processors and memories can realize a faster backup than a conventional volatile computer system, leading to a higher energy efficiency. To evaluate the energy efficiency upon frequent power failures, typical computer systems including processors and memories are designed using 90 nm CMOS or CMOS/magnetic tunnel junction (MTJ) technologies. Nonvolatile ARM Cortex-M0 processors with 4 kB MRAMs are evaluated using a typical computing benchmark program, Dhrystone, which shows a few order-of-magnitude reductions in energy in comparison with a volatile processor with SRAM.
Accuracy-Energy Configurable Sensor Processor and IoT Device for Long-Term Activity Monitoring in Rare-Event Sensing Applications

PubMed Central

2014-01-01

A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error. PMID:25580458
Edge smoothing for real-time simulation of a polygon face object system as viewed by a moving observer

NASA Technical Reports Server (NTRS)

Lotz, Robert W. (Inventor); Westerman, David J. (Inventor)

1980-01-01

The visual system within an aircraft flight simulation system receives flight data and terrain data which is formated into a buffer memory. The image data is forwarded to an image processor which translates the image data into face vertex vectors Vf, defining the position relationship between the vertices of each terrain object and the aircraft. The image processor then rotates, clips, and projects the image data into two-dimensional display vectors (Vd). A display generator receives the Vd faces, and other image data to provide analog inputs to CRT devices which provide the window displays for the simulated aircraft. The video signal to the CRT devices passes through an edge smoothing device which prolongs the rise time (and fall time) of the video data inversely as the slope of the edge being smoothed. An operational amplifier within the edge smoothing device has a plurality of independently selectable feedback capacitors each having a different value. The values of the capacitors form a series which doubles as a power of two. Each feedback capacitor has a fast switch responsive to the corresponding bit of a digital binary control word for selecting (1) or not selecting (0) that capacitor. The control word is determined by the slope of each edge. The resulting actual feedback capacitance for each edge is the sum of all the selected capacitors and is directly proportional to the value of the binary control word. The output rise time (or fall time) is a function of the feedback capacitance, and is controlled by the slope through the binary control word.
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-12-31

Advances in computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable to or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generatedmore » in a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of the MCNP code on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electronphoton transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load-balancing schemes for homogeneous and heterogeneous networks.« less
Designing systems to satisfy their users - The coming changes in aviation weather and the development of a central weather processor

NASA Technical Reports Server (NTRS)

Bush, M. W.

1984-01-01

Attention is given to the development history of the Central Weather Processor (CWP) program of the Federal Aviation Administration. The CWP will interface with high speed digital communications links, accept data and information products from new sources, generate data processing products, and provide meteorologists with the capability to automate data retrieval and dissemination. The CWP's users are operational (air traffic controllers, meteorologists and pilots), institutional (logistics, maintenance, testing and evaluation personnel), and administrative.
DBPQL: A view-oriented query language for the Intel Data Base Processor

NASA Technical Reports Server (NTRS)

Fishwick, P. A.

1983-01-01

An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.
Vigilante: Ultrafast Smart Sensor for Target Recognition and Precision Tracking in a Simulated CMD Scenario

NASA Technical Reports Server (NTRS)

Uldomkesmalee, Suraphol; Suddarth, Steven C.

1997-01-01

VIGILANTE is an ultrafast smart sensor testbed for generic Automatic Target Recognition (ATR) applications with a series of capability demonstration focussed on cruise missile defense (CMD). VIGILANTE's sensor/processor architecture is based on next-generation UV/visible/IR sensors and a tera-operations per second sugar-cube processor, as well as supporting airborne vehicle. Excellent results of efficient ATR methodologies that use an eigenvectors/neural network combination and feature-based precision tracking have been demonstrated in the laboratory environment.
Implementing Legacy-C Algorithms in FPGA Co-Processors for Performance Accelerated Smart Payloads

NASA Technical Reports Server (NTRS)

Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.; Hartzell, Christine

2008-01-01

Accurate, on-board classification of instrument data is used to increase science return by autonomously identifying regions of interest for priority transmission or generating summary products to conserve transmission bandwidth. Due to on-board processing constraints, such classification has been limited to using the simplest functions on a small subset of the full instrument data. FPGA co-processor designs for SVM1 classifiers will lead to significant improvement in on-board classification capability and accuracy.
Architectures for single-chip image computing

NASA Astrophysics Data System (ADS)

Gove, Robert J.

1992-04-01

This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
Reconfigurable data path processor

NASA Technical Reports Server (NTRS)

Donohoe, Gregory (Inventor)

2005-01-01

A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command. The output control command is generated by processing a set of arithmetic status-bits through a logical mask. The conditional multiplexer output is coupled to a first processing element output. A first set of arithmetic bits are generated according to the processing of the first processable value. A second set of arithmetic bits may be generated from a second processing operation. The selection of the arithmetic status-bits is performed by an arithmetic-status bit multiplexer selects the desired set of arithmetic status bits from among the first and second set of arithmetic status bits. The conditional multiplexer evaluates the select arithmetic status bits according to logical mask defining an algorithm for evaluating the arithmetic status bits.
An integrated MEMS infrastructure for fuel processing: hydrogen generation and separation for portable power generation

NASA Astrophysics Data System (ADS)

Varady, M. J.; McLeod, L.; Meacham, J. M.; Degertekin, F. L.; Fedorov, A. G.

2007-09-01

Portable fuel cells are an enabling technology for high efficiency and ultra-high density distributed power generation, which is essential for many terrestrial and aerospace applications. A key element of fuel cell power sources is the fuel processor, which should have the capability to efficiently reform liquid fuels and produce high purity hydrogen that is consumed by the fuel cells. To this end, we are reporting on the development of two novel MEMS hydrogen generators with improved functionality achieved through an innovative process organization and system integration approach that exploits the advantages of transport and catalysis on the micro/nano scale. One fuel processor design utilizes transient, reverse-flow operation of an autothermal MEMS microreactor with an intimately integrated, micromachined ultrasonic fuel atomizer and a Pd/Ag membrane for in situ hydrogen separation from the product stream. The other design features a simpler, more compact planar structure with the atomized fuel ejected directly onto the catalyst layer, which is coupled to an integrated hydrogen selective membrane.
Benchmarking NWP Kernels on Multi- and Many-core Processors

NASA Astrophysics Data System (ADS)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Error detection method

DOEpatents

Olson, Eric J.

2013-06-11

An apparatus, program product, and method that run an algorithm on a hardware based processor, generate a hardware error as a result of running the algorithm, generate an algorithm output for the algorithm, compare the algorithm output to another output for the algorithm, and detect the hardware error from the comparison. The algorithm is designed to cause the hardware based processor to heat to a degree that increases the likelihood of hardware errors to manifest, and the hardware error is observable in the algorithm output. As such, electronic components may be sufficiently heated and/or sufficiently stressed to create better conditions for generating hardware errors, and the output of the algorithm may be compared at the end of the run to detect a hardware error that occurred anywhere during the run that may otherwise not be detected by traditional methodologies (e.g., due to cooling, insufficient heat and/or stress, etc.).
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
Acousto-optic time- and space-integrating spotlight-mode SAR processor

NASA Astrophysics Data System (ADS)

Haney, Michael W.; Levy, James J.; Michael, Robert R., Jr.

1993-09-01

The technical approach and recent experimental results for the acousto-optic time- and space- integrating real-time SAR image formation processor program are reported. The concept overcomes the size and power consumption limitations of electronic approaches by using compact, rugged, and low-power analog optical signal processing techniques for the most computationally taxing portions of the SAR imaging problem. Flexibility and performance are maintained by the use of digital electronics for the critical low-complexity filter generation and output image processing functions. The results include a demonstration of the processor's ability to perform high-resolution spotlight-mode SAR imaging by simultaneously compensating for range migration and range/azimuth coupling in the analog optical domain, thereby avoiding a highly power-consuming digital interpolation or reformatting operation usually required in all-electronic approaches.
System and method for bearing fault detection using stator current noise cancellation

DOEpatents

Zhou, Wei; Lu, Bin; Habetler, Thomas G.; Harley, Ronald G.; Theisen, Peter J.

2010-08-17

A system and method for detecting incipient mechanical motor faults by way of current noise cancellation is disclosed. The system includes a controller configured to detect indicia of incipient mechanical motor faults. The controller further includes a processor programmed to receive a baseline set of current data from an operating motor and define a noise component in the baseline set of current data. The processor is also programmed to repeatedly receive real-time operating current data from the operating motor and remove the noise component from the operating current data in real-time to isolate any fault components present in the operating current data. The processor is then programmed to generate a fault index for the operating current data based on any isolated fault components.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
Description and Simulation of a Fast Packet Switch Architecture for Communication Satellites

NASA Technical Reports Server (NTRS)

Quintana, Jorge A.; Lizanich, Paul J.

1995-01-01

The NASA Lewis Research Center has been developing the architecture for a multichannel communications signal processing satellite (MCSPS) as part of a flexible, low-cost meshed-VSAT (very small aperture terminal) network. The MCSPS architecture is based on a multifrequency, time-division-multiple-access (MF-TDMA) uplink and a time-division multiplex (TDM) downlink. There are eight uplink MF-TDMA beams, and eight downlink TDM beams, with eight downlink dwells per beam. The information-switching processor, which decodes, stores, and transmits each packet of user data to the appropriate downlink dwell onboard the satellite, has been fully described by using VHSIC (Very High Speed Integrated-Circuit) Hardware Description Language (VHDL). This VHDL code, which was developed in-house to simulate the information switching processor, showed that the architecture is both feasible and viable. This paper describes a shared-memory-per-beam architecture, its VHDL implementation, and the simulation efforts.

Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

PubMed Central

Farreras, Montse

2014-01-01

Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675
Real-time digital holographic microscopy using the graphic processing unit.

PubMed

Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi

2008-08-04

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
Vibrational Analysis of Engine Components Using Neural-Net Processing and Electronic Holography

NASA Technical Reports Server (NTRS)

Decker, Arthur J.; Fite, E. Brian; Mehmed, Oral; Thorp, Scott A.

1997-01-01

The use of computational-model trained artificial neural networks to acquire damage specific information from electronic holograms is discussed. A neural network is trained to transform two time-average holograms into a pattern related to the bending-induced-strain distribution of the vibrating component. The bending distribution is very sensitive to component damage unlike the characteristic fringe pattern or the displacement amplitude distribution. The neural network processor is fast for real-time visualization of damage. The two-hologram limit makes the processor more robust to speckle pattern decorrelation. Undamaged and cracked cantilever plates serve as effective objects for testing the combination of electronic holography and neural-net processing. The requirements are discussed for using finite-element-model trained neural networks for field inspections of engine components. The paper specifically discusses neural-network fringe pattern analysis in the presence of the laser speckle effect and the performances of two limiting cases of the neural-net architecture.
Vibrational Analysis of Engine Components Using Neural-Net Processing and Electronic Holography

NASA Technical Reports Server (NTRS)

Decker, Arthur J.; Fite, E. Brian; Mehmed, Oral; Thorp, Scott A.

1998-01-01

The use of computational-model trained artificial neural networks to acquire damage specific information from electronic holograms is discussed. A neural network is trained to transform two time-average holograms into a pattern related to the bending-induced-strain distribution of the vibrating component. The bending distribution is very sensitive to component damage unlike the characteristic fringe pattern or the displacement amplitude distribution. The neural network processor is fast for real-time visualization of damage. The two-hologram limit makes the processor more robust to speckle pattern decorrelation. Undamaged and cracked cantilever plates serve as effective objects for testing the combination of electronic holography and neural-net processing. The requirements are discussed for using finite-element-model trained neural networks for field inspections of engine components. The paper specifically discusses neural-network fringe pattern analysis in the presence of the laser speckle effect and the performances of two limiting cases of the neural-net architecture.
Ray tracing on the MPP

NASA Technical Reports Server (NTRS)

Dorband, John E.

1987-01-01

Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Feasibility of a special-purpose computer to solve the Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Gritton, E. C.; King, W. S.; Sutherland, I.; Gaines, R. S.; Gazley, C., Jr.; Grosch, C.; Juncosa, M.; Petersen, H.

1978-01-01

Orders-of-magnitude improvements in computer performance can be realized with a parallel array of thousands of fast microprocessors. In this architecture, wiring congestion is minimized by limiting processor communication to nearest neighbors. When certain standard algorithms are applied to a viscous flow problem and existing LSI technology is used, performance estimates of this conceptual design show a dramatic decrease in computational time when compared to the CDC 7600.
SOP: parallel surrogate global optimization with Pareto center selection for computationally expensive single objective problems

DOE PAGES

Krityakierne, Tipaluck; Akhtar, Taimoor; Shoemaker, Christine A.

2016-02-02

This paper presents a parallel surrogate-based global optimization method for computationally expensive objective functions that is more effective for larger numbers of processors. To reach this goal, we integrated concepts from multi-objective optimization and tabu search into, single objective, surrogate optimization. Our proposed derivative-free algorithm, called SOP, uses non-dominated sorting of points for which the expensive function has been previously evaluated. The two objectives are the expensive function value of the point and the minimum distance of the point to previously evaluated points. Based on the results of non-dominated sorting, P points from the sorted fronts are selected as centersmore » from which many candidate points are generated by random perturbations. Based on surrogate approximation, the best candidate point is subsequently selected for expensive evaluation for each of the P centers, with simultaneous computation on P processors. Centers that previously did not generate good solutions are tabu with a given tenure. We show almost sure convergence of this algorithm under some conditions. The performance of SOP is compared with two RBF based methods. The test results show that SOP is an efficient method that can reduce time required to find a good near optimal solution. In a number of cases the efficiency of SOP is so good that SOP with 8 processors found an accurate answer in less wall-clock time than the other algorithms did with 32 processors.« less
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Direct match data flow machine apparatus and process for data driven computing

DOEpatents

Davidson, G.S.; Grafe, V.G.

1997-08-12

A data flow computer and method of computing are disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing

DOEpatents

Davidson, G.S.; Grafe, V.G.

1988-07-22

A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information from an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing

DOEpatents

Davidson, George S.; Grafe, Victor G.

1995-01-01

A data flow computer which of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow machine apparatus and process for data driven computing

DOEpatents

Davidson, George S.; Grafe, Victor Gerald

1997-01-01

A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow memory for data driven computing

DOEpatents

Davidson, George S.; Grafe, Victor Gerald

1997-01-01

A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow memory for data driven computing

DOEpatents

Davidson, G.S.; Grafe, V.G.

1997-10-07

A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
CTF Preprocessor User's Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avramova, Maria; Salko, Robert K.

2016-05-26

This document describes how a user should go about using the CTF pre- processor tool to create an input deck for modeling rod-bundle geometry in CTF. The tool was designed to generate input decks in a quick and less error-prone manner for CTF. The pre-processor is a completely independent utility, written in Fortran, that takes a reduced amount of input from the user. The information that the user must supply is basic information on bundle geometry, such as rod pitch, clad thickness, and axial location of spacer grids--the pre-processor takes this basic information and determines channel placement and connection informationmore » to be written to the input deck, which is the most time-consuming and error-prone segment of creating a deck. Creation of the model is also more intuitive, as the user can specify assembly and water-tube placement using visual maps instead of having to place them by determining channel/channel and rod/channel connections. As an example of the benefit of the pre-processor, a quarter-core model that contains 500,000 scalar-mesh cells was read into CTF from an input deck containing 200,000 lines of data. This 200,000 line input deck was produced automatically from a set of pre-processor decks that contained only 300 lines of data.« less
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray W

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
Concept For Generation Of Long Pseudorandom Sequences

NASA Technical Reports Server (NTRS)

Wang, C. C.

1990-01-01

Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.
RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors

NASA Astrophysics Data System (ADS)

Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco

2015-09-01

This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.
Satellite on-board real-time SAR processor prototype

NASA Astrophysics Data System (ADS)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and size are reviewed.
Approaches in highly parameterized inversion: TSPROC, a general time-series processor to assist in model calibration and result summarization

USGS Publications Warehouse

Westenbroek, Stephen M.; Doherty, John; Walker, John F.; Kelson, Victor A.; Hunt, Randall J.; Cera, Timothy B.

2012-01-01

The TSPROC (Time Series PROCessor) computer software uses a simple scripting language to process and analyze time series. It was developed primarily to assist in the calibration of environmental models. The software is designed to perform calculations on time-series data commonly associated with surface-water models, including calculation of flow volumes, transformation by means of basic arithmetic operations, and generation of seasonal and annual statistics and hydrologic indices. TSPROC can also be used to generate some of the key input files required to perform parameter optimization by means of the PEST (Parameter ESTimation) computer software. Through the use of TSPROC, the objective function for use in the model-calibration process can be focused on specific components of a hydrograph.

Water Processor and Oxygen Generation Assembly

NASA Technical Reports Server (NTRS)

Bedard, John

1997-01-01

This report documents the results of the tasks which initiated efforts on design issues relating to the Water Processor (WP) and the Oxygen Generation Assembly (OGA) Flight Hardware for the International Space Station. This report fulfills the Statement of Work deliverables requirement for contract H-29387D. The following lists the tasks required by contract H-29387D: (1) HSSSI shall coordinate a detailed review of WP/OGA Flight Hardware program requirements with personnel from MSFC to identify requirements that can be eliminated without affecting the technical integrity of the WP/OGA Hardware; (2) HSSSI shall conduct the technical interchanges with personnel from MSFC to resolve design issues related to WP/OGA Flight Hardware; (3) HSSSI will initiate discussions with Zellwegger Analytics, Inc. to address design issues related to WP and PCWQM interfaces.
DD-αAMG on QPACE 3

NASA Astrophysics Data System (ADS)

Georg, Peter; Richtmann, Daniel; Wettig, Tilo

2018-03-01

We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations.
Sequential pattern data mining and visualization

DOEpatents

Wong, Pak Chung [Richland, WA; Jurrus, Elizabeth R [Kennewick, WA; Cowley, Wendy E [Benton City, WA; Foote, Harlan P [Richland, WA; Thomas, James J [Richland, WA

2011-12-06

One or more processors (22) are operated to extract a number of different event identifiers therefrom. These processors (22) are further operable to determine a number a display locations each representative of one of the different identifiers and a corresponding time. The display locations are grouped into sets each corresponding to a different one of several event sequences (330a, 330b, 330c. 330d, 330e). An output is generated corresponding to a visualization (320) of the event sequences (330a, 330b, 330c, 330d, 330e).
Sequential pattern data mining and visualization

DOEpatents

Wong, Pak Chung [Richland, WA; Jurrus, Elizabeth R [Kennewick, WA; Cowley, Wendy E [Benton City, WA; Foote, Harlan P [Richland, WA; Thomas, James J [Richland, WA

2009-05-26

One or more processors (22) are operated to extract a number of different event identifiers therefrom. These processors (22) are further operable to determine a number a display locations each representative of one of the different identifiers and a corresponding time. The display locations are grouped into sets each corresponding to a different one of several event sequences (330a, 330b, 330c. 330d, 330e). An output is generated corresponding to a visualization (320) of the event sequences (330a, 330b, 330c, 330d, 330e).
Earth Orbiter 1 (EO-1): Wideband Advanced Recorder and Processor (WARP)

NASA Technical Reports Server (NTRS)

Smith, Terry; Kessler, John

1999-01-01

An overview of the Earth Orbitor 1 (EO1) Wideband Advanced Recorder and Processor (WARP) is presented in viewgraph form. The WARP is a spacecraft component that receives, stores, and processes high rate science data and its associated ancillary data from multispectral detectors, hyperspectral detectors, and an atmospheric corrector, and then transmits the data via an X-band or S-band transmitter to the ground station. The WARP project goals are: (1) Pathfinder for next generation LANDSAT mission; (2) Flight prove architectures and technologies; and (3) Identify future technology needs.
State estimation for distributed systems with sensing delay

NASA Astrophysics Data System (ADS)

Alexander, Harold L.

1991-08-01

Control of complex systems such as remote robotic vehicles requires combining data from many sensors where the data may often be delayed by sensory processing requirements. The number and variety of sensors make it desirable to distribute the computational burden of sensing and estimation among multiple processors. Classic Kalman filters do not lend themselves to distributed implementations or delayed measurement data. The alternative Kalman filter designs presented in this paper are adapted for delays in sensor data generation and for distribution of computation for sensing and estimation over a set of networked processors.
Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

De Supinski, B.; Caliga, D.

2017-09-28

The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
Systolic Processor Array For Recognition Of Spectra

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Peterson, John C.

1995-01-01

Spectral signatures of materials detected and identified quickly. Spectral Analysis Systolic Processor Array (SPA2) relatively inexpensive and satisfies need to analyze large, complex volume of multispectral data generated by imaging spectrometers to extract desired information: computational performance needed to do this in real time exceeds that of current supercomputers. Locates highly similar segments or contiguous subsegments in two different spectra at time. Compares sampled spectra from instruments with data base of spectral signatures of known materials. Computes and reports scores that express degrees of similarity between sampled and data-base spectra.
The mass storage testing laboratory at GSFC

NASA Technical Reports Server (NTRS)

Venkataraman, Ravi; Williams, Joel; Michaud, David; Gu, Heng; Kalluri, Atri; Hariharan, P. C.; Kobler, Ben; Behnke, Jeanne; Peavey, Bernard

1998-01-01

Industry-wide benchmarks exist for measuring the performance of processors (SPECmarks), and of database systems (Transaction Processing Council). Despite storage having become the dominant item in computing and IT (Information Technology) budgets, no such common benchmark is available in the mass storage field. Vendors and consultants provide services and tools for capacity planning and sizing, but these do not account for the complete set of metrics needed in today's archives. The availability of automated tape libraries, high-capacity RAID systems, and high- bandwidth interconnectivity between processor and peripherals has led to demands for services which traditional file systems cannot provide. File Storage and Management Systems (FSMS), which began to be marketed in the late 80's, have helped to some extent with large tape libraries, but their use has introduced additional parameters affecting performance. The aim of the Mass Storage Test Laboratory (MSTL) at Goddard Space Flight Center is to develop a test suite that includes not only a comprehensive check list to document a mass storage environment but also benchmark code. Benchmark code is being tested which will provide measurements for both baseline systems, i.e. applications interacting with peripherals through the operating system services, and for combinations involving an FSMS. The benchmarks are written in C, and are easily portable. They are initially being aimed at the UNIX Open Systems world. Measurements are being made using a Sun Ultra 170 Sparc with 256MB memory running Solaris 2.5.1 with the following configuration: 4mm tape stacker on SCSI 2 Fast/Wide; 4GB disk device on SCSI 2 Fast/Wide; and Sony Petaserve on Fast/Wide differential SCSI 2.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

NASA Astrophysics Data System (ADS)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
Image matrix processor for fast multi-dimensional computations

DOEpatents

Roberson, G.P.; Skeate, M.F.

1996-10-15

An apparatus for multi-dimensional computation is disclosed which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination. 10 figs.
Bitstream decoding processor for fast entropy decoding of variable length coding-based multiformat videos

NASA Astrophysics Data System (ADS)

Jo, Hyunho; Sim, Donggyu

2014-06-01

We present a bitstream decoding processor for entropy decoding of variable length coding-based multiformat videos. Since most of the computational complexity of entropy decoders comes from bitstream accesses and table look-up process, the developed bitstream processing unit (BsPU) has several designated instructions to access bitstreams and to minimize branch operations in the table look-up process. In addition, the instruction for bitstream access has the capability to remove emulation prevention bytes (EPBs) of H.264/AVC without initial delay, repeated memory accesses, and additional buffer. Experimental results show that the proposed method for EPB removal achieves a speed-up of 1.23 times compared to the conventional EPB removal method. In addition, the BsPU achieves speed-ups of 5.6 and 3.5 times in entropy decoding of H.264/AVC and MPEG-4 Visual bitstreams, respectively, compared to an existing processor without designated instructions and a new table mapping algorithm. The BsPU is implemented on a Xilinx Virtex5 LX330 field-programmable gate array. The MPEG-4 Visual (ASP, Level 5) and H.264/AVC (Main Profile, Level 4) are processed using the developed BsPU with a core clock speed of under 250 MHz in real time.
Advanced electronics for the CTF MEG system.

PubMed

McCubbin, J; Vrba, J; Spear, P; McKenzie, D; Willis, R; Loewen, R; Robinson, S E; Fife, A A

2004-11-30

Development of the CTF MEG system has been advanced with the introduction of a computer processing cluster between the data acquisition electronics and the host computer. The advent of fast processors, memory, and network interfaces has made this innovation feasible for large data streams at high sampling rates. We have implemented tasks including anti-alias filter, sample rate decimation, higher gradient balancing, crosstalk correction, and optional filters with a cluster consisting of 4 dual Intel Xeon processors operating on up to 275 channel MEG systems at 12 kHz sample rate. The architecture is expandable with additional processors to implement advanced processing tasks which may include e.g., continuous head localization/motion correction, optional display filters, coherence calculations, or real time synthetic channels (via beamformer). We also describe an electronics configuration upgrade to provide operator console access to the peripheral interface features such as analog signal and trigger I/O. This allows remote location of the acoustically noisy electronics cabinet and fitting of the cabinet with doors for improved EMI shielding. Finally, we present the latest performance results available for the CTF 275 channel MEG system including an unshielded SEF (median nerve electrical stimulation) measurement enhanced by application of an adaptive beamformer technique (SAM) which allows recognition of the nominal 20-ms response in the unaveraged signal.
QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

PubMed Central

Gudyś, Adam; Deorowicz, Sebastian

2014-01-01

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435
Multiprocessor architectural study

NASA Technical Reports Server (NTRS)

Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.

1972-01-01

An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.
Efficacy of Code Optimization on Cache-Based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Saphir, William C.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

In this paper a number of techniques for improving the cache performance of a representative piece of numerical software is presented. Target machines are popular processors from several vendors: MIPS R5000 (SGI Indy), MIPS R8000 (SGI PowerChallenge), MIPS R10000 (SGI Origin), DEC Alpha EV4 + EV5 (Cray T3D & T3E), IBM RS6000 (SP Wide-node), Intel PentiumPro (Ames' Whitney), Sun UltraSparc (NERSC's NOW). The optimizations all attempt to increase the locality of memory accesses. But they meet with rather varied and often counterintuitive success on the different computing platforms. We conclude that it may be genuinely impossible to obtain portable performance on the current generation of cache-based machines. At the least, it appears that the performance of modern commodity processors cannot be described with parameters defining the cache alone.
Design and Implementation of a CMOS Chip for a Prolog

DTIC Science & Technology

1988-03-01

generation scheme . We use the P -circuit [9] with pre-conditioning and post- conditioning 12,3] circuits to generate the carry. The implementation of...system generates vertical microcode for a general purpose processor, the NCR 9300 sys- S tem, from W- code [7]. Three significant pieces of software are...calculation block generating the pro- pagate ( P ) and generate (G) signals needed for carry calculation, and a sum block supplying the final result. The top
Symposium on Turbulence (10th) Held in Rollo, Missouri on September 22-24, 1986

DTIC Science & Technology

1986-09-24

that speckle velooimstry is rather excercised when attempting to obtain promising under 3000 oiraulstanoes, quantitative information from thisCould yOU...and free. small scale intermittency , it will have~Jack Herrin2. NCAR: Isn’t the reason to be based on some alternative measure olarge helicity...processor obtained under NASA’s Nu- merical Aerodynamic Simulation (NAS) project combines a relatively fast CPU with about 258 million words of memory. This
DSS 13 Microprocessor Antenna Controller

NASA Technical Reports Server (NTRS)

Gosline, R. M.

1984-01-01

A microprocessor based antenna controller system developed as part of the unattended station project for DSS 13 is described. Both the hardware and software top level designs are presented and the major problems encounted are discussed. Developments useful to related projects include a JPL standard 15 line interface using a single board computer, a general purpose parser, a fast floating point to ASCII conversion technique, and experience gained in using off board floating point processors with the 8080 CPU.
Fast-response free-running dc-to-dc converter employing a state-trajectory control law

NASA Technical Reports Server (NTRS)

Huffman, S. D.; Burns, W. W., III; Wilson, T. G.; Owen, H. A., Jr.

1977-01-01

A recently proposed state-trajectory control law for a family of energy-storage dc-to-dc converters has been implemented for the voltage step-up configuration. Two methods of realization are discussed; one employs a digital processor and the other uses analog computational circuits. Performance characteristics of experimental voltage step-up converters operating under the control of each of these implementations are reported and compared to theoretical predictions and computer simulations.

Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

PubMed

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Cache write generate for parallel image processing on shared memory architectures.

PubMed

Wittenbrink, C M; Somani, A K; Chen, C H

1996-01-01

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.

NASA Astrophysics Data System (ADS)

Feldman, Michael Robert

Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
Thermal Characterization of Defects in Aircraft Structures Via Spatially Controlled Heat Application

NASA Technical Reports Server (NTRS)

Cramer, K. Elliott; Winfree, William P.

1997-01-01

Recent advances in thermal imaging technology have spawned a number of new thermal NDE techniques that provide quantitative information about flaws in aircraft structures. Thermography has a number of advantages as an inspection technique. It is a totally noncontacting, nondestructive, imaging technology capable of inspecting a large area in a matter of a few seconds. The development of fast, inexpensive image processors have aided in the attractiveness of thermography as an NDE technique. These image processors have increased the signal to noise ratio of thermography and facilitated significant advances in post-processing. The resulting digital images enable archival records for comparison with later inspections thus providing a means of monitoring the evolution of damage in a particular structure. The National Aeronautics and Space Administration's Langley Research Center has developed a thermal NDE technique designed to image a number of potential flaws in aircraft structures. The technique involves injecting a small, spatially controlled heat flux into the outer surface of an aircraft. Images of fatigue cracking, bond integrity and material loss due to corrosion are generated from measurements of the induced surface temperature variations. This paper will present a discussion of the development of the thermal imaging system as well as the techniques used to analyze the resulting thermal images. Spatial tailoring of the heat coupled with the analysis techniques represent a significant improvement in the delectability of flaws over conventional thermal imaging. Results of laboratory experiments on fabricated crack, disbond and material loss samples will be presented to demonstrate the capabilities of the technique. An integral part of the development of this technology is the use of analytic and computational modeling. The experimental results will be compared with these models to demonstrate the utility of such an approach.
The impact of Moore's Law and loss of Dennard scaling: Are DSP SoCs an energy efficient alternative to x86 SoCs?

NASA Astrophysics Data System (ADS)

Johnsson, L.; Netzer, G.

2016-10-01

Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be “underutilized” or “dark” due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark)
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

NASA Technical Reports Server (NTRS)

Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.

2012-01-01

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Monte Carlo generator ELRADGEN 2.0 for simulation of radiative events in elastic ep-scattering of polarized particles

NASA Astrophysics Data System (ADS)

Akushevich, I.; Filoti, O. F.; Ilyichev, A.; Shumeiko, N.

2012-07-01

The structure and algorithms of the Monte Carlo generator ELRADGEN 2.0 designed to simulate radiative events in polarized ep-scattering are presented. The full set of analytical expressions for the QED radiative corrections is presented and discussed in detail. Algorithmic improvements implemented to provide faster simulation of hard real photon events are described. Numerical tests show high quality of generation of photonic variables and radiatively corrected cross section. The comparison of the elastic radiative tail simulated within the kinematical conditions of the BLAST experiment at MIT BATES shows a good agreement with experimental data. Catalogue identifier: AELO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 1299 No. of bytes in distributed program, including test data, etc.: 11 348 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: All Operating system: Any RAM: 1 MB Classification: 11.2, 11.4 Nature of problem: Simulation of radiative events in polarized ep-scattering. Solution method: Monte Carlo simulation according to the distributions of the real photon kinematic variables that are calculated by the covariant method of QED radiative correction estimation. The approach provides rather fast and accurate generation. Running time: The simulation of 108 radiative events for itest:=1 takes up to 52 seconds on Pentium(R) Dual-Core 2.00 GHz processor.
FPGA-Based, Self-Checking, Fault-Tolerant Computers

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2004-01-01

A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
A Micro-Processor Based System as a Teaching Tool.

ERIC Educational Resources Information Center

Spero, Samuel W.

1979-01-01

Two instructional strategies incorporating a microprocessor-based computer system are described. These are the use of the system to drive a television monitor, and the system's use in generating problem sets. (MP)
Performance Evaluation of an Intel Haswell- and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

NASA Technical Reports Server (NTRS)

Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John

2016-01-01

We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, William D; Johansen, Hans; Evans, Katherine J

We present a survey of physical and computational techniques that have the potential to con- tribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enabling improved accuracy andmore » fidelity in simulation of dynamics and allow more complete representations of climate features at the global scale. At the same time, part- nerships with computer science teams have focused on taking advantage of evolving computer architectures, such as many-core processors and GPUs, so that these approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
A Computational Methodology to Screen Activities of Enzyme Variants

PubMed Central

Hediger, Martin R.; De Vico, Luca; Svendsen, Allan; Besenmatter, Werner; Jensen, Jan H.

2012-01-01

We present a fast computational method to efficiently screen enzyme activity. In the presented method, the effect of mutations on the barrier height of an enzyme-catalysed reaction can be computed within 24 hours on roughly 10 processors. The methodology is based on the PM6 and MOZYME methods as implemented in MOPAC2009, and is tested on the first step of the amide hydrolysis reaction catalyzed by the Candida Antarctica lipase B (CalB) enzyme. The barrier heights are estimated using adiabatic mapping and shown to give barrier heights to within 3 kcal/mol of B3LYP/6-31G(d)//RHF/3-21G results for a small model system. Relatively strict convergence criteria (0.5 kcal/(molÅ)), long NDDO cutoff distances within the MOZYME method (15 Å) and single point evaluations using conventional PM6 are needed for reliable results. The generation of mutant structures and subsequent setup of the semiempirical calculations are automated so that the effect on barrier heights can be estimated for hundreds of mutants in a matter of weeks using high performance computing. PMID:23284627
The structure, logic of operation and distinctive features of the system of triggers and counting signals formation for gamma-telescope GAMMA-400

NASA Astrophysics Data System (ADS)

Topchiev, N. P.; Galper, A. M.; Arkhangelskiy, A. I.; Arkhangelskaja, I. V.; Kheymits, M. D.; Suchkov, S. I.; Yurkin, Y. T.

2017-01-01

Scientific project GAMMA-400 (Gamma Astronomical Multifunctional Modular Apparatus) relates to the new generation of space observatories intended to perform an indirect search for signatures of dark matter in the cosmic-ray fluxes, measurements of characteristics of diffuse gamma-ray emission and gamma-rays from the Sun during periods of solar activity, gamma-ray bursts, extended and point gamma-ray sources, electron/positron and cosmic-ray nuclei fluxes up to TeV energy region by means of the GAMMA-400 gamma-ray telescope represents the core of the scientific complex. The system of triggers and counting signals formation of the GAMMA-400 gamma-ray telescope constitutes the pipelined processor structure which collects data from the gamma-ray telescope subsystems and produces summary information used in forming the trigger decision for each event. The system design is based on the use of state-of-the-art reconfigurable logic devices and fast data links. The basic structure, logic of operation and distinctive features of the system are presented.
BarraCUDA - a fast short read sequence aligner using graphics processing units

PubMed Central

2012-01-01

Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Parallel network simulations with NEURON.

PubMed

Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

2006-10-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Eliminating livelock by assigning the same priority state to each message that is input into a flushable routing system during N time intervals

DOEpatents

Faber, V.

1994-11-29

Livelock-free message routing is provided in a network of interconnected nodes that is flushable in time T. An input message processor generates sequences of at least N time intervals, each of duration T. An input register provides for receiving and holding each input message, where the message is assigned a priority state p during an nth one of the N time intervals. At each of the network nodes a message processor reads the assigned priority state and awards priority to messages with priority state (p-1) during an nth time interval and to messages with priority state p during an (n+1) th time interval. The messages that are awarded priority are output on an output path toward the addressed output message processor. Thus, no message remains in the network for a time longer than T. 4 figures.
A Survey of Architectural Techniques For Improving Cache Power Efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-worldmore » processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.« less
Investigation of DMSD Trend in the ISS Water Processor Assembly

NASA Technical Reports Server (NTRS)

Carter, Layne; Bowman, Elizabeth; Wilson, Mark; Gentry, Greg; Rector, Tony

2013-01-01

The ISS Water Recovery System (WRS) is responsible for providing potable water to the crew, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. The WRS includes the Water Processor Assembly (WPA) and the Urine Processor Assembly (UPA). The WPA processes condensate from the cabin air and distillate produced by the UPA. In 2010, an increasing trend in the Total Organic Carbon (TOC) in the potable water was ultimately identified as dimethylsilanediol (DMSD). The increasing trend was ultimately reversed after replacing the WPA's two multifiltration beds. However, the reason for the TOC trend and the subsequent recovery was not understood. A subsequent trend occurred in 2012. This paper summarizes the current understanding of the fate of DMSD in the WPA, how the increasing TOC trend occurred, and the plan for modifying the WPA to prevent recurrence.

Parallel Network Simulations with NEURON

PubMed Central

Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

2009-01-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
Eliminating livelock by assigning the same priority state to each message that is inputted into a flushable routing system during N time intervals

DOEpatents

Faber, Vance

1994-01-01

Livelock-free message routing is provided in a network of interconnected nodes that is flushable in time T. An input message processor generates sequences of at least N time intervals, each of duration T. An input register provides for receiving and holding each input message, where the message is assigned a priority state p during an nth one of the N time intervals. At each of the network nodes a message processor reads the assigned priority state and awards priority to messages with priority state (p-1) during an nth time interval and to messages with priority state p during an (n+1) th time interval. The messages that are awarded priority are output on an output path toward the addressed output message processor. Thus, no message remains in the network for a time longer than T.
Method and system to estimate variables in an integrated gasification combined cycle (IGCC) plant

DOEpatents

Kumar, Aditya; Shi, Ruijie; Dokucu, Mustafa

2013-09-17

System and method to estimate variables in an integrated gasification combined cycle (IGCC) plant are provided. The system includes a sensor suite to measure respective plant input and output variables. An extended Kalman filter (EKF) receives sensed plant input variables and includes a dynamic model to generate a plurality of plant state estimates and a covariance matrix for the state estimates. A preemptive-constraining processor is configured to preemptively constrain the state estimates and covariance matrix to be free of constraint violations. A measurement-correction processor may be configured to correct constrained state estimates and a constrained covariance matrix based on processing of sensed plant output variables. The measurement-correction processor is coupled to update the dynamic model with corrected state estimates and a corrected covariance matrix. The updated dynamic model may be configured to estimate values for at least one plant variable not originally sensed by the sensor suite.
System and method for motor fault detection using stator current noise cancellation

DOEpatents

Zhou, Wei; Lu, Bin; Nowak, Michael P.; Dimino, Steven A.

2010-12-07

A system and method for detecting incipient mechanical motor faults by way of current noise cancellation is disclosed. The system includes a controller configured to detect indicia of incipient mechanical motor faults. The controller further includes a processor programmed to receive a baseline set of current data from an operating motor and define a noise component in the baseline set of current data. The processor is also programmed to acquire at least on additional set of real-time operating current data from the motor during operation, redefine the noise component present in each additional set of real-time operating current data, and remove the noise component from the operating current data in real-time to isolate any fault components present in the operating current data. The processor is then programmed to generate a fault index for the operating current data based on any isolated fault components.
General purpose molecular dynamics simulations fully implemented on graphics processing units

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.

2008-05-01

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
GRAMM-X public web server for protein–protein docking

PubMed Central

Tovchigrechko, Andrey; Vakser, Ilya A.

2006-01-01

Protein docking software GRAMM-X and its web interface () extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track. PMID:16845016
Collaborative writing: Tools and tips.

PubMed

Eapen, Bell Raj

2007-01-01

Majority of technical writing is done by groups of experts and various web based applications have made this collaboration easy. Email exchange of word processor documents with tracked changes used to be the standard technique for collaborative writing. However web based tools like Google docs and Spreadsheets have made the process fast and efficient. Various versioning tools and synchronous editors are available for those who need additional functionality. Having a group leader who decides the scheduling, communication and conflict resolving protocols is important for successful collaboration.
Fast Interrupt Priority Management in Operating System Kernels

DTIC Science & Technology

1993-05-01

We present results for the Mach 3.0 microkernel operating system, although the technique is applicable to other kernel architectures, both micro and...protection in the Mach 3.0 microkernel for several different processor architectures. For example, on the Omron Luna88k, we observed a 50% reduction in...general interrupt mask raise/lower pair within the Mach 3.0 microkernel on a variety of architectures. DTIC QUALM i.N1’R%.*1IMD 5 k81tltC Avail andl
Compiler-directed cache management in multiprocessors

NASA Technical Reports Server (NTRS)

Cheong, Hoichi; Veidenbaum, Alexander V.

1990-01-01

The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Multichannel signal enhancement

DOEpatents

Lewis, Paul S.

1990-01-01

A mixed adaptive filter is formulated for the signal processing problem where desired a priori signal information is not available. The formulation generates a least squares problem which enables the filter output to be calculated directly from an input data matrix. In one embodiment, a folded processor array enables bidirectional data flow to solve the recursive problem by back substitution without global communications. In another embodiment, a balanced processor array solves the recursive problem by forward elimination through the array. In a particular application to magnetoencephalography, the mixed adaptive filter enables an evoked response to an auditory stimulus to be identified from only a single trial.
Visualization of information with an established order

DOEpatents

Wong, Pak Chung [Richland, WA; Foote, Harlan P [Richmond, WA; Thomas, James J [Richland, WA; Wong, Kwong-Kwok [Sugar Land, TX

2007-02-13

Among the embodiments of the present invention is a system including one or more processors operable to access data representative of a biopolymer sequence of monomer units. The one or more processors are further operable to establish a pattern corresponding to at least one fractal curve and generate one or more output signals corresponding to a number of image elements each representative of one of the monomer units. Also included is a display device responsive to the one or more output signals to visualize the biopolymer sequence by displaying the image elements in accordance with the pattern.
SPAR data set contents. [finite element structural analysis system

NASA Technical Reports Server (NTRS)

Cunningham, S. W.

1981-01-01

The contents of the stored data sets of the SPAR (space processing applications rocket) finite element structural analysis system are documented. The data generated by each of the system's processors are stored in a data file organized as a library. Each data set, containing a two-dimensional table or matrix, is identified by a four-word name listed in a table of contents. The creating SPAR processor, number of rows and columns, and definitions of each of the data items are listed for each data set. An example SPAR problem using these data sets is also presented.
Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
ORNL Cray X1 evaluation status report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, P.K.; Alexander, R.A.; Apra, E.

2004-05-01

On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of themore » architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.« less
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Implementation of Adaptive Digital Controllers on Programmable Logic Devices

NASA Technical Reports Server (NTRS)

Gwaltney, David A.; King, Kenneth D.; Smith, Keary J.; Ormsby, John (Technical Monitor)

2002-01-01

Much has been made of the capabilities of FPGA's (Field Programmable Gate Arrays) in the hardware implementation of fast digital signal processing (DSP) functions. Such capability also makes and FPGA a suitable platform for the digital implementation of closed loop controllers. There are myriad advantages to utilizing an FPGA for discrete-time control functions which include the capability for reconfiguration when SRAM- based FPGA's are employed, fast parallel implementation of multiple control loops and implementations that can meet space level radiation tolerance in a compact form-factor. Other researchers have presented the notion that a second order digital filter with proportional-integral-derivative (PID) control functionality can be implemented in an FPGA. At Marshall Space Flight Center, the Control Electronics Group has been studying adaptive discrete-time control of motor driven actuator systems using digital signal processor (DSF) devices. Our goal is to create a fully digital, flight ready controller design that utilizes an FPGA for implementation of signal conditioning for control feedback signals, generation of commands to the controlled system, and hardware insertion of adaptive control algorithm approaches. While small form factor, commercial DSP devices are now available with event capture, data conversion, pulse width modulated outputs and communication peripherals, these devices are not currently available in designs and packages which meet space level radiation requirements. Meeting our goals requires alternative compact implementation of such functionality to withstand the harsh environment encountered on spacecraft. Radiation tolerant FPGA's are a feasible option for reaching these goals.
Algorithms and Libraries

NASA Technical Reports Server (NTRS)

Dongarra, Jack

1998-01-01

This exploratory study initiated our inquiry into algorithms and applications that would benefit by latency tolerant approach to algorithm building, including the construction of new algorithms where appropriate. In a multithreaded execution, when a processor reaches a point where remote memory access is necessary, the request is sent out on the network and a context--switch occurs to a new thread of computation. This effectively masks a long and unpredictable latency due to remote loads, thereby providing tolerance to remote access latency. We began to develop standards to profile various algorithm and application parameters, such as the degree of parallelism, granularity, precision, instruction set mix, interprocessor communication, latency etc. These tools will continue to develop and evolve as the Information Power Grid environment matures. To provide a richer context for this research, the project also focused on issues of fault-tolerance and computation migration of numerical algorithms and software. During the initial phase we tried to increase our understanding of the bottlenecks in single processor performance. Our work began by developing an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. Based on the results we achieved in this study we are planning to study other architectures of interest, including development of cost models, and developing code generators appropriate to these architectures.
Burst-mode optical label processor with ultralow power consumption.

PubMed

Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo

2016-04-04

A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.
HPCC Methodologies for Structural Design and Analysis on Parallel and Distributed Computing Platforms

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1998-01-01

In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.

Overview of implementation of DARPA GPU program in SAIC

NASA Astrophysics Data System (ADS)

Braunreiter, Dennis; Furtek, Jeremy; Chen, Hai-Wen; Healy, Dennis

2008-04-01

This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA) and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.
FAST: A multi-processed environment for visualization of computational fluid dynamics

NASA Technical Reports Server (NTRS)

Bancroft, Gordon V.; Merritt, Fergus J.; Plessel, Todd C.; Kelaita, Paul G.; Mccabe, R. Kevin

1991-01-01

Three-dimensional, unsteady, multi-zoned fluid dynamics simulations over full scale aircraft are typical of the problems being investigated at NASA Ames' Numerical Aerodynamic Simulation (NAS) facility on CRAY2 and CRAY-YMP supercomputers. With multiple processor workstations available in the 10-30 Mflop range, we feel that these new developments in scientific computing warrant a new approach to the design and implementation of analysis tools. These larger, more complex problems create a need for new visualization techniques not possible with the existing software or systems available as of this writing. The visualization techniques will change as the supercomputing environment, and hence the scientific methods employed, evolves even further. The Flow Analysis Software Toolkit (FAST), an implementation of a software system for fluid mechanics analysis, is discussed.
Speech recognition for embedded automatic positioner for laparoscope

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

2014-07-01

In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.
Use of FPGA embedded processors for fast cluster reconstruction in the NA62 liquid krypton electromagnetic calorimeter

NASA Astrophysics Data System (ADS)

Badoni, D.; Bizzarri, M.; Bonaiuto, V.; Checcucci, B.; De Simone, N.; Federici, L.; Fucci, A.; Paoluzzi, G.; Papi, A.; Piccini, M.; Salamon, A.; Salina, G.; Santovetti, E.; Sargeni, F.; Venditti, S.

2014-01-01

The goal of the NA62 experiment at the CERN SPS is the measurement of the Branching Ratio of the very rare kaon decay K+→π+ ν bar nu with a 10% accuracy by collecting 100 events in two years of data taking. An efficient photon veto system is needed to reject the K+→π+ π0 background and a liquid krypton electromagnetic calorimeter will be used for this purpose in the 1-10 mrad angular region. The L0 trigger system for the calorimeter consists of a peak reconstruction algorithm implemented on FPGA by using a mixed parallel architecture based on soft core Altera NIOS II embedded processors together with custom VHDL modules. This solution allows an efficient and flexible reconstruction of the energy-deposition peak. The system will be totally composed of 36 TEL62 boards, 108 mezzanine cards and 215 high-performance FPGAs. We describe the design, current status and the results of the first performance tests.
A fast, programmable hardware architecture for the processing of spaceborne SAR data

NASA Technical Reports Server (NTRS)

Bennett, J. R.; Cumming, I. G.; Lim, J.; Wedding, R. M.

1984-01-01

The development of high-throughput SAR processors (HTSPs) for the spaceborne SARs being planned by NASA, ESA, DFVLR, NASDA, and the Canadian Radarsat Project is discussed. The basic parameters and data-processing requirements of the SARs are listed in tables, and the principal problems are identified as real-operations rates in excess of 2 x 10 to the 9th/sec, I/O rates in excess of 8 x 10 to the 6th samples/sec, and control computation loads (as for range cell migration correction) as high as 1.4 x 10 to the 6th instructions/sec. A number of possible HTSP architectures are reviewed; host/array-processor (H/AP) and distributed-control/data-path (DCDP) architectures are examined in detail and illustrated with block diagrams; and a cost/speed comparison of these two architectures is presented. The H/AP approach is found to be adequate and economical for speeds below 1/200 of real time, while DCDP is more cost-effective above 1/50 of real time.
Method and apparatus for combinatorial logic signal processor in a digitally based high speed x-ray spectrometer

DOEpatents

Warburton, William K.; Zhou, Zhiquing

1999-01-01

A high speed, digitally based, signal processing system which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system livetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired.
Concurrent and Accurate Short Read Mapping on Multicore Processors.

PubMed

Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

2015-01-01

We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Media processors using a new microsystem architecture designed for the Internet era

NASA Astrophysics Data System (ADS)

Wyland, David C.

1999-12-01

The demands of digital image processing, communications and multimedia applications are growing more rapidly than traditional design methods can fulfill them. Previously, only custom hardware designs could provide the performance required to meet the demands of these applications. However, hardware design has reached a crisis point. Hardware design can no longer deliver a product with the required performance and cost in a reasonable time for a reasonable risk. Software based designs running on conventional processors can deliver working designs in a reasonable time and with low risk but cannot meet the performance requirements. What is needed is a media processing approach that combines very high performance, a simple programming model, complete programmability, short time to market and scalability. The Universal Micro System (UMS) is a solution to these problems. The UMS is a completely programmable (including I/O) system on a chip that combines hardware performance with the fast time to market, low cost and low risk of software designs.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

NASA Astrophysics Data System (ADS)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Compact gasoline fuel processor for passenger vehicle APU

NASA Astrophysics Data System (ADS)

Severin, Christopher; Pischinger, Stefan; Ogrzewalla, Jürgen

Due to the increasing demand for electrical power in today's passenger vehicles, and with the requirements regarding fuel consumption and environmental sustainability tightening, a fuel cell-based auxiliary power unit (APU) becomes a promising alternative to the conventional generation of electrical energy via internal combustion engine, generator and battery. It is obvious that the on-board stored fuel has to be used for the fuel cell system, thus, gasoline or diesel has to be reformed on board. This makes the auxiliary power unit a complex integrated system of stack, air supply, fuel processor, electrics as well as heat and water management. Aside from proving the technical feasibility of such a system, the development has to address three major barriers:start-up time, costs, and size/weight of the systems. In this paper a packaging concept for an auxiliary power unit is presented. The main emphasis is placed on the fuel processor, as good packaging of this large subsystem has the strongest impact on overall size. The fuel processor system consists of an autothermal reformer in combination with water-gas shift and selective oxidation stages, based on adiabatic reactors with inter-cooling. The configuration was realized in a laboratory set-up and experimentally investigated. The results gained from this confirm a general suitability for mobile applications. A start-up time of 30 min was measured, while a potential reduction to 10 min seems feasible. An overall fuel processor efficiency of about 77% was measured. On the basis of the know-how gained by the experimental investigation of the laboratory set-up a packaging concept was developed. Using state-of-the-art catalyst and heat exchanger technology, the volumes of these components are fixed. However, the overall volume is higher mainly due to mixing zones and flow ducts, which do not contribute to the chemical or thermal function of the system. Thus, the concept developed mainly focuses on minimization of those component volumes. Therefore, the packaging utilizes rectangular catalyst bricks and integrates flow ducts into the heat exchangers. A concept is presented with a 25 l fuel processor volume including thermal isolation for a 3 kW el auxiliary power unit. The overall size of the system, i.e. including stack, air supply and auxiliaries can be estimated to 44 l.
Fast-Neutron Survey With Compact Plastic Scintillation Detectors.

PubMed

Preston, Rhys M; Tickner, James R

2017-07-01

With the rise of the Silicon Photomultiplier (SiPM), it is now practical to build compact scintillation detectors well suited to portable use. A prototype survey meter for fast-neutrons and gamma-rays, based around an EJ-299-34 plastic scintillator with SiPM readout, has been developed and tested. A custom digital pulse processor was used to perform pulse shape discrimination on-the-fly. Ambient dose equivalent H*(10) was calculated by means of two energy-dependent 'G-functions'. The sensitivity was calculated to be between 0.10 and 0.22 cps/(µSv/hr) for fast-neutrons with energies above 2.5 MeV. The prototype was used to survey various laboratory radiation fields, with the readings compared with commercial survey meters. The high sensitivity and lightweight nature of this detector makes it promising for rapid survey of the mixed neutron/gamma-ray fields encountered in industry and homeland security. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Web-based Toolkit for Dynamic Generation of Data Processors

NASA Astrophysics Data System (ADS)

Patel, J.; Dascalu, S.; Harris, F. C.; Benedict, K. K.; Gollberg, G.; Sheneman, L.

2011-12-01

All computation-intensive scientific research uses structured datasets, including hydrology and all other types of climate-related research. When it comes to testing their hypotheses, researchers might use the same dataset differently, and modify, transform, or convert it to meet their research needs. Currently, many researchers spend a good amount of time performing data processing and building tools to speed up this process. They might routinely repeat the same process activities for new research projects, spending precious time that otherwise could be dedicated to analyzing and interpreting the data. Numerous tools are available to run tests on prepared datasets and many of them work with datasets in different formats. However, there is still a significant need for applications that can comprehensively handle data transformation and conversion activities and help prepare the various processed datasets required by the researchers. We propose a web-based application (a software toolkit) that dynamically generates data processors capable of performing data conversions, transformations, and customizations based on user-defined mappings and selections. As a first step, the proposed solution allows the users to define various data structures and, in the next step, can select various file formats and data conversions for their datasets of interest. In a simple scenario, the core of the proposed web-based toolkit allows the users to define direct mappings between input and output data structures. The toolkit will also support defining complex mappings involving the use of pre-defined sets of mathematical, statistical, date/time, and text manipulation functions. Furthermore, the users will be allowed to define logical cases for input data filtering and sampling. At the end of the process, the toolkit is designed to generate reusable source code and executable binary files for download and use by the scientists. The application is also designed to store all data structures and mappings defined by a user (an author), and allow the original author to modify them using standard authoring techniques. The users can change or define new mappings to create new data processors for download and use. In essence, when executed, the generated data processor binary file can take an input data file in a given format and output this data, possibly transformed, in a different file format. If they so desire, the users will be able modify directly the source code in order to define more complex mappings and transformations that are not currently supported by the toolkit. Initially aimed at supporting research in hydrology, the toolkit's functions and features can be either directly used or easily extended to other areas of climate-related research. The proposed web-based data processing toolkit will be able to generate various custom software processors for data conversion and transformation in a matter of seconds or minutes, saving a significant amount of researchers' time and allowing them to focus on core research issues.
Development and design of experiments optimization of a high temperature proton exchange membrane fuel cell auxiliary power unit with onboard fuel processor

NASA Astrophysics Data System (ADS)

Karstedt, Jörg; Ogrzewalla, Jürgen; Severin, Christopher; Pischinger, Stefan

In this work, the concept development, system layout, component simulation and the overall DOE system optimization of a HT-PEM fuel cell APU with a net electric power output of 4.5 kW and an onboard methane fuel processor are presented. A highly integrated system layout has been developed that enables fast startup within 7.5 min, a closed system water balance and high fuel processor efficiencies of up to 85% due to the recuperation of the anode offgas burner heat. The integration of the system battery into the load management enhances the transient electric performance and the maximum electric power output of the APU system. Simulation models of the carbon monoxide influence on HT-PEM cell voltage, the concentration and temperature profiles within the autothermal reformer (ATR) and the CO conversion rates within the watergas shift stages (WGSs) have been developed. They enable the optimization of the CO concentration in the anode gas of the fuel cell in order to achieve maximum system efficiencies and an optimized dimensioning of the ATR and WGS reactors. Furthermore a DOE optimization of the global system parameters cathode stoichiometry, anode stoichiometry, air/fuel ratio and steam/carbon ratio of the fuel processing system has been performed in order to achieve maximum system efficiencies for all system operating points under given boundary conditions.
Load Balancing Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pearce, Olga Tkachyshyn

2014-12-01

The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
Real-time portable system for fabric defect detection using an ARM processor

NASA Astrophysics Data System (ADS)

Fernandez-Gallego, J. A.; Yañez-Puentes, J. P.; Ortiz-Jaramillo, B.; Alvarez, J.; Orjuela-Vargas, S. A.; Philips, W.

2012-06-01

Modern textile industry seeks to produce textiles as little defective as possible since the presence of defects can decrease the final price of products from 45% to 65%. Automated visual inspection (AVI) systems, based on image analysis, have become an important alternative for replacing traditional inspections methods that involve human tasks. An AVI system gives the advantage of repeatability when implemented within defined constrains, offering more objective and reliable results for particular tasks than human inspection. Costs of automated inspection systems development can be reduced using modular solutions with embedded systems, in which an important advantage is the low energy consumption. Among the possibilities for developing embedded systems, the ARM processor has been explored for acquisition, monitoring and simple signal processing tasks. In a recent approach we have explored the use of the ARM processor for defects detection by implementing the wavelet transform. However, the computation speed of the preprocessing was not yet sufficient for real time applications. In this approach we significantly improve the preprocessing speed of the algorithm, by optimizing matrix operations, such that it is adequate for a real time application. The system was tested for defect detection using different defect types. The paper is focused in giving a detailed description of the basis of the algorithm implementation, such that other algorithms may use of the ARM operations for fast implementations.
Geospace simulations using modern accelerator processor technology

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D. J.

2009-12-01

OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
Self-sustained operation of a kW e-class kerosene-reforming processor for solid oxide fuel cells

NASA Astrophysics Data System (ADS)

Yoon, Sangho; Bae, Joongmyeon; Kim, Sunyoung; Yoo, Young-Sung

In this paper, fuel-processing technologies are developed for application in residential power generation (RPG) in solid oxide fuel cells (SOFCs). Kerosene is selected as the fuel because of its high hydrogen density and because of the established infrastructure that already exists in South Korea. A kerosene fuel processor with two different reaction stages, autothermal reforming (ATR) and adsorptive desulfurization reactions, is developed for SOFC operations. ATR is suited to the reforming of liquid hydrocarbon fuels because oxygen-aided reactions can break the aromatics in the fuel and steam can suppress carbon deposition during the reforming reaction. ATR can also be implemented as a self-sustaining reactor due to the exothermicity of the reaction. The kW e self-sustained kerosene fuel processor, including the desulfurizer, operates for about 250 h in this study. This fuel processor does not require a heat exchanger between the ATR reactor and the desulfurizer or electric equipment for heat supply and fuel or water vaporization because a suitable temperature of the ATR reformate is reached for H 2S adsorption on the ZnO catalyst beds in desulfurizer. Although the CH 4 concentration in the reformate gas of the fuel processor is higher due to the lower temperature of ATR tail gas, SOFCs can directly use CH 4 as a fuel with the addition of sufficient steam feeds (H 2O/CH 4 ≥ 1.5), in contrast to low-temperature fuel cells. The reforming efficiency of the fuel processor is about 60%, and the desulfurizer removed H 2S to a sufficient level to allow for the operation of SOFCs.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Analog Ranging Modem Code Processor and Generator

DOT National Transportation Integrated Search

1974-05-01

The report details technical development efforts to implement an analog ranging modem using recently developed linear integrated circuits where possible. The breadboard hardware is capable of acquiring frequency and phase of a weak signal in a high n...

Real-time generation of the Wigner distribution of complex functions using phase conjugation in photorefractive materials.

PubMed

Sun, P C; Fainman, Y

1990-09-01

An optical processor for real-time generation of the Wigner distribution of complex amplitude functions is introduced. The phase conjugation of the input signal is accomplished by a highly efficient self-pumped phase conjugator based on a 45 degrees -cut barium titanate photorefractive crystal. Experimental results on the real-time generation of Wigner distribution slices for complex amplitude two-dimensional optical functions are presented and discussed.
Computer systems and methods for the query and visualization multidimensional databases

DOEpatents

Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

2017-04-25

A method of generating a data visualization is performed at a computer having a display, one or more processors, and memory. The memory stores one or more programs for execution by the one or more processors. The process receives user specification of a plurality of characteristics of a data visualization. The data visualization is based on data from a multidimensional database. The characteristics specify at least x-position and y-position of data marks corresponding to tuples of data retrieved from the database. The process generates a data visualization according to the specified plurality of characteristics. The data visualization has an x-axis defined based on data for one or more first fields from the database that specify x-position of the data marks and the data visualization has a y-axis defined based on data for one or more second fields from the database that specify y-position of the data marks.
Multi Modality Brain Mapping System (MBMS) Using Artificial Intelligence and Pattern Recognition

NASA Technical Reports Server (NTRS)

Nikzad, Shouleh (Inventor); Kateb, Babak (Inventor)

2017-01-01

A Multimodality Brain Mapping System (MBMS), comprising one or more scopes (e.g., microscopes or endoscopes) coupled to one or more processors, wherein the one or more processors obtain training data from one or more first images and/or first data, wherein one or more abnormal regions and one or more normal regions are identified; receive a second image captured by one or more of the scopes at a later time than the one or more first images and/or first data and/or captured using a different imaging technique; and generate, using machine learning trained using the training data, one or more viewable indicators identifying one or abnormalities in the second image, wherein the one or more viewable indicators are generated in real time as the second image is formed. One or more of the scopes display the one or more viewable indicators on the second image.
Beam Stability R&D for the APS MBA Upgrade

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sereno, Nicholas S.; Arnold, Ned D.; Bui, Hanh D.

2015-01-01

Beam diagnostics required for the APS Multi-bend acromat (MBA) are driven by ambitious beam stability requirements. The major AC stability challenge is to correct rms beam motion to 10% the rms beam size at the insertion device source points from0.01 to 1000 Hz. The vertical plane represents the biggest challenge forAC stability, which is required to be 400 nm rms for a 4-micron vertical beam size. In addition to AC stability, long-term drift over a period of seven days is required to be 1 micron or less. Major diagnostics R&D components include improved rf beam position processing using commercially availablemore » FPGA-based BPM processors, new X-ray beam position monitors based on hard X-ray fluorescence from copper and Compton scattering off diamond, mechanical motion sensing to detect and correct long-term vacuum chamber drift, a new feedback system featuring a tenfold increase in sampling rate, and a several-fold increase in the number of fast correctors and BPMs in the feedback algorithm. Feedback system development represents a major effort, and we are pursuing development of a novel algorithm that integrates orbit correction for both slow and fast correctors down to DC simultaneously. Finally, a new data acquisition system (DAQ) is being developed to simultaneously acquire streaming data from all diagnostics as well as the feedback processors for commissioning and fault diagnosis. Results of studies and the design effort are reported.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE PAGES

Daily, Jeffrey A.

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Mission Critical Computer Resources Management Guide

DTIC Science & Technology

1988-09-01

Support Analyzers, Management, Generators Environments Word Workbench Processors Showroom System Structure HO Compilers IMath 1OperatingI Functions I...Simulated Automated, On-Line Generators Support Exercises Catalog, Function Environments Formal Spec Libraries Showroom System Structure I ADA Trackers I...shown in Figure 13-2. In this model, showrooms of larger more capable piecesare developed off-line for later integration and use in multiple systems
Science on Sequoia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertsch, Adam; Draeger, Erik; Richards, David

2017-01-12

With Sequoia at Lawrence Livermore National Laboratory, researchers explore grand challenging problems and are generating results at scales never before achieved. Sequoia is the first computer to have more than one million processors and is one of the fastest supercomputers in the world.
Probe for optically monitoring progress of in-situ vitrification of soil

DOEpatents

Timmerman, Craig L.; Oma, Kenton H.; Davis, Karl C.

1988-01-01

A detector system for sensing the progress of an ISV process along an expected path comprises multiple sensors each having an input port. The input ports are distributed along the expected path of the ISV process between a starting location and an expected ending location. Each sensor generates an electrical signal representative of the temperature in the vicinity of its input port. A signal processor is coupled to the sensors to receive an electrical signal generated by a sensor, and generate a signal which is encoded with information which identifies the sensor and whether the ISV process has reached the sensor's input port. A transmitter propagates the encoded signal. The signal processor and the transmitter are below ground at a location beyond the expected ending location of the ISV process in the direction from the starting location to the expected ending location. A signal receiver and a decoder are located above ground for receiving the encoded signal propagated by the transmitter, decoding the encoded signal and providing a human-perceptible indication of the progress of the ISV process.
Probe for optically monitoring progress of in-situ vitrification of soil

DOEpatents

Timmerman, C.L.; Oma, K.H.; Davis, K.C.

1988-08-09

A detector system for sensing the progress of an ISV process along an expected path comprises multiple sensors each having an input port. The input ports are distributed along the expected path of the ISV process between a starting location and an expected ending location. Each sensor generates an electrical signal representative of the temperature in the vicinity of its input port. A signal processor is coupled to the sensors to receive an electrical signal generated by a sensor, and generate a signal which is encoded with information which identifies the sensor and whether the ISV process has reached the sensor's input port. A transmitter propagates the encoded signal. The signal processor and the transmitter are below ground at a location beyond the expected ending location of the ISV process in the direction from the starting location to the expected ending location. A signal receiver and a decoder are located above ground for receiving the encoded signal propagated by the transmitter, decoding the encoded signal and providing a human-perceptible indication of the progress of the ISV process. 7 figs.
Efficient Parallel Formulations of Hierarchical Methods and Their Applications

NASA Astrophysics Data System (ADS)

Grama, Ananth Y.

1996-01-01

Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Wilson, Laura Labuda; Orozco, Nicole

2012-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment, and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of May 2011, and describes the technical challenges encountered and lessons learned over the past year.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Pruitt, Jennifer; Brown, Christopher A.; Bazley, Jesse; Gazda, Daniel; Schaezler, Ryan; Bankers, Lyndsey

2016-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of May 2016 and describes the technical challenges encountered and lessons learned over the past year.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Brown, Christopher; Orozco, Nicole

2014-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment, and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of June 2013, and describes the technical challenges encountered and lessons learned over the past year.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Tobias, Barry; Orozco, Nicole

2012-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment, and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of June 2012, and describes the technical challenges encountered and lessons learned over the past year.
Special-purpose computing for dense stellar systems

NASA Astrophysics Data System (ADS)

Makino, Junichiro

2007-08-01

I'll describe the current status of the GRAPE-DR project. The GRAPE-DR is the next-generation hardware for N-body simulation. Unlike the previous GRAPE hardwares, it is programmable SIMD machine with a large number of simple processors integrated into a single chip. The GRAPE-DR chip consists of 512 simple processors and operates at the clock speed of 500 MHz, delivering the theoretical peak speed of 512/226 Gflops (single/double precision). As of August 2006, the first prototype board with the sample chip successfully passed the test we prepared. The full GRAPE-DR system will consist of 4096 chips, reaching the theoretical peak speed of 2 Pflops.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Takada, Kevin; Gazda, Daniel; Brown, Christopher; Bazley, Jesse; Schaezler, Ryan; Bankers, Lyndsey

2017-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of June 2017 and describes the technical challenges encountered and lessons learned over the past year.
Status of ISS Water Management and Recovery

NASA Technical Reports Server (NTRS)

Carter, Layne; Pruitt, Jennifer; Brown, Christopher A.; Schaezler, Ryan; Bankers, Lyndsey

2015-01-01

Water management on ISS is responsible for the provision of water to the crew for drinking water, food preparation, and hygiene, to the Oxygen Generation System (OGS) for oxygen production via electrolysis, to the Waste & Hygiene Compartment (WHC) for flush water, and for experiments on ISS. This paper summarizes water management activities on the ISS US Segment, and provides a status of the performance and issues related to the operation of the Water Processor Assembly (WPA) and Urine Processor Assembly (UPA). This paper summarizes the on-orbit status as of May 2015 and describes the technical challenges encountered and lessons learned over the past two years.
Real time SAR processing

NASA Technical Reports Server (NTRS)

Premkumar, A. B.; Purviance, J. E.

1990-01-01

A simplified model for the SAR imaging problem is presented. The model is based on the geometry of the SAR system. Using this model an expression for the entire phase history of the received SAR signal is formulated. From the phase history, it is shown that the range and the azimuth coordinates for a point target image can be obtained by processing the phase information during the intrapulse and interpulse periods respectively. An architecture for a VLSI implementation for the SAR signal processor is presented which generates images in real time. The architecture uses a small number of chips, a new correlation processor, and an efficient azimuth correlation process.
35-We polymer electrolyte membrane fuel cell system for notebook computer using a compact fuel processor

NASA Astrophysics Data System (ADS)

Son, In-Hyuk; Shin, Woo-Cheol; Lee, Yong-Kul; Lee, Sung-Chul; Ahn, Jin-Gu; Han, Sang-Il; kweon, Ho-Jin; Kim, Ju-Yong; Kim, Moon-Chan; Park, Jun-Yong

A polymer electrolyte membrane fuel cell (PEMFC) system is developed to power a notebook computer. The system consists of a compact methanol-reforming system with a CO preferential oxidation unit, a 16-cell PEMFC stack, and a control unit for the management of the system with a d.c.-d.c. converter. The compact fuel-processor system (260 cm 3) generates about 1.2 L min -1 of reformate, which corresponds to 35 We, with a low CO concentration (<30 ppm, typically 0 ppm), and is thus proven to be capable of being targetted at notebook computers.

A methodology for achieving high-speed rates for artificial conductance injection in electrically excitable biological cells.

PubMed

Butera, R J; Wilson, C G; Delnegro, C A; Smith, J C

2001-12-01

We present a novel approach to implementing the dynamic-clamp protocol (Sharp et al., 1993), commonly used in neurophysiology and cardiac electrophysiology experiments. Our approach is based on real-time extensions to the Linux operating system. Conventional PC-based approaches have typically utilized single-cycle computational rates of 10 kHz or slower. In thispaper, we demonstrate reliable cycle-to-cycle rates as fast as 50 kHz. Our system, which we call model reference current injection (MRCI); pronounced merci is also capable of episodic logging of internal state variables and interactive manipulation of model parameters. The limiting factor in achieving high speeds was not processor speed or model complexity, but cycle jitter inherent in the CPU/motherboard performance. We demonstrate these high speeds and flexibility with two examples: 1) adding action-potential ionic currents to a mammalian neuron under whole-cell patch-clamp and 2) altering a cell's intrinsic dynamics via MRCI while simultaneously coupling it via artificial synapses to an internal computational model cell. These higher rates greatly extend the applicability of this technique to the study of fast electrophysiological currents such fast a currents and fast excitatory/inhibitory synapses.
LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators

NASA Astrophysics Data System (ADS)

Gonzalez, Juan; Núñez, Rafael C.

2009-07-01

We present LAPACKrc, a family of FPGA-based linear algebra solvers able to achieve more than 100x speedup per commodity processor on certain problems. LAPACKrc subsumes some of the LAPACK and ScaLAPACK functionalities, and it also incorporates sparse direct and iterative matrix solvers. Current LAPACKrc prototypes demonstrate between 40x-150x speedup compared against top-of-the-line hardware/software systems. A technology roadmap is in place to validate current performance of LAPACKrc in HPC applications, and to increase the computational throughput by factors of hundreds within the next few years.
Fast 4-2 Compressor of Booth Multiplier Circuits for High-Speed RISC Processor

NASA Astrophysics Data System (ADS)

Yuan, S. C.

2008-11-01

We use different XOR circuits to optimize the XOR structure 4-2 compressor, and design the transmission gates(TG) 4-2 compressor use single to dual rail circuit configurations. The maximum propagation delay, the power consumption and the layout area of the designed 4-2 compressors are simulated with 0.35μm and 0.25μm CMOS process parameters and compared with results of the synthesized 4-2 circuits, and show that the designed 4-2 compressors are faster and area smaller than the synthesized one.
On-board multicarrier demodulator for mobile applications using DSP implementation

NASA Astrophysics Data System (ADS)

Yim, W. H.; Kwan, C. C. D.; Coakley, F. P.; Evans, B. G.

1990-11-01

This paper describes the design and implementation of an on-board multicarrier demodulator using commercial digital signal processors. This is for use in a mobile satellite communication system employing an up-link SCPC/FDMA scheme. Channels are separated by a flexible multistage digital filter bank followed by a channel multiplexed digital demodulator array. The cross/dot product design approach of error detector leads to a new QPSK frequency control algorithm that allows fast acquisition without special preamble pattern. Timing correction is performed digitally using an extended stack of polyphase sub-filters.
Design and implementation of a medium speed communications interface and protocol for a low cost, refreshed display computer

NASA Technical Reports Server (NTRS)

Phyne, J. R.; Nelson, M. D.

1975-01-01

The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.
Conceptual study of on orbit production of cryogenic propellants by water electrolysis

NASA Technical Reports Server (NTRS)

Moran, Matthew E.

1991-01-01

The feasibility is assessed of producing cryogenic propellants on orbit by water electrolysis in support of NASA's proposed Space Exploration Initiative (SEI) missions. Using this method, water launched into low earth orbit (LEO) would be split into gaseous hydrogen and oxygen by electrolysis in an orbiting propellant processor spacecraft. The resulting gases would then be liquified and stored in cryogenic tanks. Supplying liquid hydrogen and oxygen fuel to space vehicles by this technique has some possible advantages over conventional methods. The potential benefits are derived from the characteristics of water as a payload, and include reduced ground handling and launch risk, denser packaging, and reduced tankage and piping requirements. A conceptual design of a water processor was generated based on related previous studies, and contemporary or near term technologies required. Extensive development efforts would be required to adapt the various subsystems needed for the propellant processor for use in space. Based on the cumulative results, propellant production by on orbit water electrolysis for support of SEI missions is not recommended.
WMAP C&DH Software

NASA Technical Reports Server (NTRS)

Cudmore, Alan; Leath, Tim; Ferrer, Art; Miller, Todd; Walters, Mark; Savadkin, Bruce; Wu, Ji-Wei; Slegel, Steve; Stagmer, Emory

2007-01-01

The command-and-data-handling (C&DH) software of the Wilkinson Microwave Anisotropy Probe (WMAP) spacecraft functions as the sole interface between (1) the spacecraft and its instrument subsystem and (2) ground operations equipment. This software includes a command-decoding and -distribution system, a telemetry/data-handling system, and a data-storage-and-playback system. This software performs onboard processing of attitude sensor data and generates commands for attitude-control actuators in a closed-loop fashion. It also processes stored commands and monitors health and safety functions for the spacecraft and its instrument subsystems. The basic functionality of this software is the same of that of the older C&DH software of the Rossi X-Ray Timing Explorer (RXTE) spacecraft, the main difference being the addition of the attitude-control functionality. Previously, the C&DH and attitude-control computations were performed by different processors because a single RXTE processor did not have enough processing power. The WMAP spacecraft includes a more-powerful processor capable of performing both computations.
Coupling of a 2.5 kW steam reformer with a 1 kW el PEM fuel cell

NASA Astrophysics Data System (ADS)

Mathiak, J.; Heinzel, A.; Roes, J.; Kalk, Th.; Kraus, H.; Brandt, H.

The University of Duisburg-Essen has developed a compact multi-fuel steam reformer suitable for natural gas, propane and butane. This steam reformer was combined with a polymer electrolyte membrane fuel cell (PEM FC) and a system test of the process chain was performed. The fuel processor comprises a prereformer step, a primary reformer, water gas shift reactors, a steam generator, internal heat exchangers in order to achieve an optimised heat integration and an external burner for heat supply as well as a preferential oxidation step (PROX) as CO purification. The fuel processor is designed to deliver a thermal hydrogen power output from 500 W to 2.5 kW. The PEM fuel cell stack provides about 1 kW electrical power. In the following paper experimental results of measurements of the single components PEM fuel cell and fuel processor as well as results of the coupling of both to form a process chain are presented.
Parallel ICA and its hardware implementation in hyperspectral image analysis

NASA Astrophysics Data System (ADS)

Du, Hongtao; Qi, Hairong; Peterson, Gregory D.

2004-04-01

Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.
Optimal Padding for the Two-Dimensional Fast Fourier Transform

NASA Technical Reports Server (NTRS)

Dean, Bruce H.; Aronstein, David L.; Smith, Jeffrey S.

2011-01-01

One-dimensional Fast Fourier Transform (FFT) operations work fastest on grids whose size is divisible by a power of two. Because of this, padding grids (that are not already sized to a power of two) so that their size is the next highest power of two can speed up operations. While this works well for one-dimensional grids, it does not work well for two-dimensional grids. For a two-dimensional grid, there are certain pad sizes that work better than others. Therefore, the need exists to generalize a strategy for determining optimal pad sizes. There are three steps in the FFT algorithm. The first is to perform a one-dimensional transform on each row in the grid. The second step is to transpose the resulting matrix. The third step is to perform a one-dimensional transform on each row in the resulting grid. Steps one and three both benefit from padding the row to the next highest power of two, but the second step needs a novel approach. An algorithm was developed that struck a balance between optimizing the grid pad size with prime factors that are small (which are optimal for one-dimensional operations), and with prime factors that are large (which are optimal for two-dimensional operations). This algorithm optimizes based on average run times, and is not fine-tuned for any specific application. It increases the amount of times that processor-requested data is found in the set-associative processor cache. Cache retrievals are 4-10 times faster than conventional memory retrievals. The tested implementation of the algorithm resulted in faster execution times on all platforms tested, but with varying sized grids. This is because various computer architectures process commands differently. The test grid was 512 512. Using a 540 540 grid on a Pentium V processor, the code ran 30 percent faster. On a PowerPC, a 256x256 grid worked best. A Core2Duo computer preferred either a 1040x1040 (15 percent faster) or a 1008x1008 (30 percent faster) grid. There are many industries that can benefit from this algorithm, including optics, image-processing, signal-processing, and engineering applications.
Maritime dynamic traffic generator : Volume II. Electronic data processing program.

DOT National Transportation Integrated Search

1975-06-01

The processor program is designed to move 18,000 merchant vessels along standard routes to their destination and keep statistical records of the ports visited, the five degree squares passed through and the occurrence of casualties. This document pre...
76 FR 54809 - Agency Information Collection Activities: Submission for the Office of Management and Budget (OMB...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-09-02

... disposal of low-level radioactive waste; and all generators, collectors, and processors of low-level waste...), NEOB-10202, Office of Management and Budget, Washington, DC 20503. Comments can also be e-mailed to...
Hybrid respiration-signal conditioner

NASA Technical Reports Server (NTRS)

Rinard, G. A.; Steffen, D. A.; Sturm, R. E.

1979-01-01

Hybrid impedance-pneumograph and respiration-rate signal conditioner element of hand-held vital signs monitor measures changes in impedance of chest during breathing cycle and generates analog respiration signal as output along with synchronous square wave that can be monitored by breath-rate processor.
Nonpreemptive run-time scheduling issues on a multitasked, multiprogrammed multiprocessor with dependencies, bidimensional tasks, folding and dynamic graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Allan Ray

1987-05-01

Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
Combining high performance simulation, data acquisition, and graphics display computers

NASA Technical Reports Server (NTRS)

Hickman, Robert J.

1989-01-01

Issues involved in the continuing development of an advanced simulation complex are discussed. This approach provides the capability to perform the majority of tests on advanced systems, non-destructively. The controlled test environments can be replicated to examine the response of the systems under test to alternative treatments of the system control design, or test the function and qualification of specific hardware. Field tests verify that the elements simulated in the laboratories are sufficient. The digital computer is hosted by a Digital Equipment Corp. MicroVAX computer with an Aptec Computer Systems Model 24 I/O computer performing the communication function. An Applied Dynamics International AD100 performs the high speed simulation computing and an Evans and Sutherland PS350 performs on-line graphics display. A Scientific Computer Systems SCS40 acts as a high performance FORTRAN program processor to support the complex, by generating numerous large files from programs coded in FORTRAN that are required for the real time processing. Four programming languages are involved in the process, FORTRAN, ADSIM, ADRIO, and STAPLE. FORTRAN is employed on the MicroVAX host to initialize and terminate the simulation runs on the system. The generation of the data files on the SCS40 also is performed with FORTRAN programs. ADSIM and ADIRO are used to program the processing elements of the AD100 and its IOCP processor. STAPLE is used to program the Aptec DIP and DIA processors.
The Advanced Communication Technology Satellite and ISDN

NASA Technical Reports Server (NTRS)

Lowry, Peter A.

1996-01-01

This paper depicts the Advanced Communication Technology Satellite (ACTS) system as a global central office switch. The ground portion of the system is the collection of earth stations or T1-VSAT's (T1 very small aperture terminals). The control software for the T1-VSAT's resides in a single CPU. The software consists of two modules, the modem manager and the call manager. The modem manager (MM) controls the RF modem portion of the T1-VSAT. It processes the orderwires from the satellite or from signaling generated by the call manager (CM). The CM controls the Recom Laboratories MSPs by receiving signaling messages from the stacked MSP shelves ro units and sending appropriate setup commands to them. There are two methods used to setup and process calls in the CM; first by dialing up a circuit using a standard telephone handset or, secondly by using an external processor connected to the CPU's second COM port, by sending and receiving signaling orderwires. It is the use of the external processor which permits the ISDN (Integrated Services Digital Network) Signaling Processor to implement ISDN calls. In August 1993, the initial testing of the ISDN Signaling Processor was carried out at ACTS System Test at Lockheed Marietta, Princeton, NJ using the spacecraft in its test configuration on the ground.
Diesel fuel to dc power: Navy & Marine Corps Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bloomfield, D.P.

1996-12-31

During the past year Analytic Power has tested fuel cell stacks and diesel fuel processors for US Navy and Marine Corps applications. The units are 10 kW demonstration power plants. The USN power plant was built to demonstrate the feasibility of diesel fueled PEM fuel cell power plants for 250 kW and 2.5 MW shipboard power systems. We designed and tested a ten cell, 1 kW USMC substack and fuel processor. The complete 10 kW prototype power plant, which has application to both power and hydrogen generation, is now under construction. The USN and USMC fuel cell stacks have beenmore » tested on both actual and simulated reformate. Analytic Power has accumulated operating experience with autothermal reforming based fuel processors operating on sulfur bearing diesel fuel, jet fuel, propane and natural gas. We have also completed the design and fabrication of an advanced regenerative ATR for the USMC. One of the significant problems with small fuel processors is heat loss which limits its ability to operate with the high steam to carbon ratios required for coke free high efficiency operation. The new USMC unit specifically addresses these heat transfer issues. The advances in the mill programs have been incorporated into Analytic Power`s commercial units which are now under test.« less
Optimizing the inner loop of the gravitational force interaction on modern processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Warren, Michael S

2010-12-08

We have achieved superior performance on multiple generations of the fastest supercomputers in the world with our hashed oct-tree N-body code (HOT), spanning almost two decades and garnering multiple Gordon Bell Prizes for significant achievement in parallel processing. Execution time for our N-body code is largely influenced by the force calculation in the inner loop. Improvements to the inner loop using SSE3 instructions has enabled the calculation of over 200 million gravitational interactions per second per processor on a 2.6 GHz Opteron, for a computational rate of over 7 Gflops in single precision (700/0 of peak). We obtain optimal performancemore » some processors (including the Cell) by decomposing the reciprocal square root function required for a gravitational interaction into a table lookup, Chebychev polynomial interpolation, and Newton-Raphson iteration, using the algorithm of Karp. By unrolling the loop by a factor of six, and using SPU intrinsics to compute on vectors, we obtain performance of over 16 Gflops on a single Cell SPE. Aggregated over the 8 SPEs on a Cell processor, the overall performance is roughly 130 Gflops. In comparison, the ordinary C version of our inner loop only obtains 1.6 Gflops per SPE with the spuxlc compiler.« less
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

PubMed

Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-06-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆

PubMed Central

Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-01-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680

Distributed processor allocation for launching applications in a massively connected processors complex

DOEpatents

Pedretti, Kevin

2008-11-18

A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.
Fast 2D FWI on a multi and many-cores workstation.

NASA Astrophysics Data System (ADS)

Thierry, Philippe; Donno, Daniela; Noble, Mark

2014-05-01

Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.
Modular System Control Development Model (MSCDM). Design Specification.

DTIC Science & Technology

1979-08-01

with power supply and ¶ can be used independently of the loop. The PDU can be used as a general purpose processor. The loop is contained in a separate...inputs to nodes 22 (VSQC), 23 (DSQC ) , and 26 (BWBSA) will be generated by a LSI—ll microprocessor used as a simulated input generator (SIG). The SIG...who c o b m n u n i — cate tau lt - s to the FIAC module. F~IAC generates even t reports to the OCRI and DBMS. The PDP1I/40 in loop 2 generates
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2017-08-15

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2014-09-09

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
NASTRAN pre and postprocessors using low-cost interactive graphics

NASA Technical Reports Server (NTRS)

Herness, E. D.; Kriloff, H. Z.

1975-01-01

A design for a NASTRAN preprocessor is given to illustrate a typical preprocessor. Several displays of NASTRAN models illustrate the preprocessor's capabilities. A design of a NASTRAN postprocessor is presented along with an example of displays generated by that NASTRAN processor.
Spaceborne VHSIC multiprocessor system for AI applications

NASA Technical Reports Server (NTRS)

Lum, Henry, Jr.; Shrobe, Howard E.; Aspinall, John G.

1988-01-01

A multiprocessor system, under design for space-station applications, makes use of the latest generation symbolic processor and packaging technology. The result will be a compact, space-qualified system two to three orders of magnitude more powerful than present-day symbolic processing systems.
MAP3D: a media processor approach for high-end 3D graphics

NASA Astrophysics Data System (ADS)

Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

1999-12-01

Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor

NASA Astrophysics Data System (ADS)

Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc

2007-03-01

Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
smallWig: parallel compression of RNA-seq WIG files.

PubMed

Wang, Zhiying; Weissman, Tsachy; Milenkovic, Olgica

2016-01-15

We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates. We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3-17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2-3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2-5.2 times. The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html, http://publish.illinois.edu/milenkovic/, http://web.stanford.edu/~tsachy/. zhiyingw@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Spaceborne synthetic aperture radar signal processing using FPGAs

NASA Astrophysics Data System (ADS)

Sugimoto, Yohei; Ozawa, Satoru; Inaba, Noriyasu

2017-10-01

Synthetic Aperture Radar (SAR) imagery requires image reproduction through successive signal processing of received data before browsing images and extracting information. The received signal data records of the ALOS-2/PALSAR-2 are stored in the onboard mission data storage and transmitted to the ground. In order to compensate the storage usage and the capacity of transmission data through the mission date communication networks, the operation duty of the PALSAR-2 is limited. This balance strongly relies on the network availability. The observation operations of the present spaceborne SAR systems are rigorously planned by simulating the mission data balance, given conflicting user demands. This problem should be solved such that we do not have to compromise the operations and the potential of the next-generation spaceborne SAR systems. One of the solutions is to compress the SAR data through onboard image reproduction and information extraction from the reproduced images. This is also beneficial for fast delivery of information products and event-driven observations by constellation. The Emergence Studio (Sōhatsu kōbō in Japanese) with Japan Aerospace Exploration Agency is developing evaluation models of FPGA-based signal processing system for onboard SAR image reproduction. The model, namely, "Fast L1 Processor (FLIP)" developed in 2016 can reproduce a 10m-resolution single look complex image (Level 1.1) from ALOS/PALSAR raw signal data (Level 1.0). The processing speed of the FLIP at 200 MHz results in twice faster than CPU-based computing at 3.7 GHz. The image processed by the FLIP is no way inferior to the image processed with 32-bit computing in MATLAB.
A Compute Capable SSD Architecture for Next-Generation Non-volatile Memories

DOE Office of Scientific and Technical Information (OSTI.GOV)

De, Arup

2014-01-01

Existing storage technologies (e.g., disks and ash) are failing to cope with the processor and main memory speed and are limiting the overall perfor- mance of many large scale I/O or data-intensive applications. Emerging fast byte-addressable non-volatile memory (NVM) technologies, such as phase-change memory (PCM), spin-transfer torque memory (STTM) and memristor are very promising and are approaching DRAM-like performance with lower power con- sumption and higher density as process technology scales. These new memories are narrowing down the performance gap between the storage and the main mem- ory and are putting forward challenging problems on existing SSD architecture, I/O interfacemore » (e.g, SATA, PCIe) and software. This dissertation addresses those challenges and presents a novel SSD architecture called XSSD. XSSD o oads com- putation in storage to exploit fast NVMs and reduce the redundant data tra c across the I/O bus. XSSD o ers a exible RPC-based programming framework that developers can use for application development on SSD without dealing with the complication of the underlying architecture and communication management. We have built a prototype of XSSD on the BEE3 FPGA prototyping system. We implement various data-intensive applications and achieve speedup and energy ef- ciency of 1.5-8.9 and 1.7-10.27 respectively. This dissertation also compares XSSD with previous work on intelligent storage and intelligent memory. The existing ecosystem and these new enabling technologies make this system more viable than earlier ones.« less
Development, implementation, and characterization of a standalone embedded viscosity measurement system based on the impedance spectroscopy of a vibrating wire sensor

NASA Astrophysics Data System (ADS)

Santos, José; Janeiro, Fernando M.; Ramos, Pedro M.

2015-10-01

This paper presents an embedded liquid viscosity measurement system based on a vibrating wire sensor. Although multiple viscometers based on different working principles are commercially available, there is still a market demand for a dedicated measurement system capable of performing accurate, fast measurements and requiring little or no operator training for simple systems and solution monitoring. The developed embedded system is based on a vibrating wire sensor that works by measuring the impedance response of the sensor, which depends on the viscosity and density of the liquid in which the sensor is immersed. The core of the embedded system is a digital signal processor (DSP) which controls the waveform generation and acquisitions for the measurement of the impedance frequency response. The DSP also processes the acquired waveforms and estimates the liquid viscosity. The user can interact with the measurement system through a keypad and an LCD or through a computer with a USB connection for data logging and processing. The presented system is tested on a set of viscosity standards and the estimated values are compared with the standard manufacturer specified viscosity values. A stability study of the measurement system is also performed.
PELE web server: atomistic study of biomolecular systems at your fingertips.

PubMed

Madadkar-Sobhani, Armin; Guallar, Victor

2013-07-01

PELE, Protein Energy Landscape Exploration, our novel technology based on protein structure prediction algorithms and a Monte Carlo sampling, is capable of modelling the all-atom protein-ligand dynamical interactions in an efficient and fast manner, with two orders of magnitude reduced computational cost when compared with traditional molecular dynamics techniques. PELE's heuristic approach generates trial moves based on protein and ligand perturbations followed by side chain sampling and global/local minimization. The collection of accepted steps forms a stochastic trajectory. Furthermore, several processors may be run in parallel towards a collective goal or defining several independent trajectories; the whole procedure has been parallelized using the Message Passing Interface. Here, we introduce the PELE web server, designed to make the whole process of running simulations easier and more practical by minimizing input file demand, providing user-friendly interface and producing abstract outputs (e.g. interactive graphs and tables). The web server has been implemented in C++ using Wt (http://www.webtoolkit.eu) and MySQL (http://www.mysql.com). The PELE web server, accessible at http://pele.bsc.es, is free and open to all users with no login requirement.
Simulation of the High Performance Time to Digital Converter for the ATLAS Muon Spectrometer trigger upgrade

NASA Astrophysics Data System (ADS)

Meng, X. T.; Levin, D. S.; Chapman, J. W.; Zhou, B.

2016-09-01

The ATLAS Muon Spectrometer endcap thin-Resistive Plate Chamber trigger project compliments the New Small Wheel endcap Phase-1 upgrade for higher luminosity LHC operation. These new trigger chambers, located in a high rate region of ATLAS, will improve overall trigger acceptance and reduce the fake muon trigger incidence. These chambers must generate a low level muon trigger to be delivered to a remote high level processor within a stringent latency requirement of 43 bunch crossings (1075 ns). To help meet this requirement the High Performance Time to Digital Converter (HPTDC), a multi-channel ASIC designed by CERN Microelectronics group, has been proposed for the digitization of the fast front end detector signals. This paper investigates the HPTDC performance in the context of the overall muon trigger latency, employing detailed behavioral Verilog simulations in which the latency in triggerless mode is measured for a range of configurations and under realistic hit rate conditions. The simulation results show that various HPTDC operational configurations, including leading edge and pair measurement modes can provide high efficiency (>98%) to capture and digitize hits within a time interval satisfying the Phase-1 latency tolerance.
Forward-biased nanophotonic detector for ultralow-energy dissipation receiver

NASA Astrophysics Data System (ADS)

Nozaki, Kengo; Matsuo, Shinji; Fujii, Takuro; Takeda, Koji; Shinya, Akihiko; Kuramochi, Eiichi; Notomi, Masaya

2018-04-01

Generally, reverse-biased photodetectors (PDs) are used for high-speed optical receivers. The forward voltage region is only utilized in solar-cells, and this photovoltaic operation would not be concurrently obtained with high efficiency and high speed operation. Here we report that photonic-crystal waveguide PDs enable forward-biased high-speed operation at 40 Gbit/s with keeping high responsivity (0.88 A/W). Within our knowledge, this is the first demonstration of the forward-biased PDs with high responsivity. This achievement is attributed to the ultracompactness of our PD and the strong light confinement within the absorber and depleted regions, thereby enabling efficient photo-carrier generation and fast extraction. This result indicates that it is possible to construct a high-speed and ultracompact photo-receiver without an electrical amplifier nor an external bias circuit. Since there is no electrical energy required, our estimation shows that the consumption energy is just the optical energy of the injected signal pulse which is about 1 fJ/bit. Hence, it will lead to an ultimately efficient and highly integrable optical-to-electrical converter in a chip, which will be a key ingredient for dense nanophotonic communication and processors.
Transformation of two and three-dimensional regions by elliptic systems

NASA Technical Reports Server (NTRS)

Mastin, C. Wayne

1994-01-01

Several reports are attached to this document which contain the results of our research at the end of this contract period. Three of the reports deal with our work on generating surface grids. One is a preprint of a paper which will appear in the journal Applied Mathematics and Computation. Another is the abstract from a dissertation which has been prepared by Ahmed Khamayseh, a graduate student who has been supported by this grant for the last two years. The last report on surface grids is the extended abstract of a paper to be presented at the 14th IMACS World Congress in July. This report contains results on conformal mappings of surfaces, which are closely related to elliptic methods for surface grid generation. A preliminary report is included on new methods for dealing with block interfaces in multiblock grid systems. The development work is complete and the methods will eventually be incorporated into the National Grid Project (NGP) grid generation code. Thus, the attached report contains only a simple grid system which was used to test the algorithms to prove that the concepts are sound. These developments will greatly aid grid control when using elliptic systems and prevent unwanted grid movement. The last report is a brief summary of some timings that were obtained when the multiblock grid generation code was run on the Intel IPSC/860 hypercube. Since most of the data in a grid code is local to a particular block, only a small fraction of the total data must be passed between processors. The data is also distributed among the processors so that the total size of the grid can be increase along with the number of processors. This work is only in a preliminary stage. However, one of the ERC graduate students has taken an interest in the project and is presently extending these results as a part of his master's thesis.
A leap forward with UTK s Cray XC30

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fahey, Mark R

2014-01-01

This paper shows a significant productivity leap for several science groups and the accomplishments they have made to date on Darter - a Cray XC30 at the University of Tennessee Knoxville. The increased productivity is due to faster processors and interconnect combined in a new generation from Cray, and yet it still has a very similar programming environment as compared to previous generations of Cray machines that makes porting easy.
Subpicosecond Optical Digital Computation Using Conjugate Parametric Generators

DTIC Science & Technology

1989-03-31

Using Phase Conjugate Farametric Generators ..... 12. PERSONAL AUTHOR(S) Alfano, Robert- Eichmann . George; Dorsinville. Roger! Li. Yao 13a. TYPE OF...conjugation-based optical residue arithmetic processor," Y. Li, G. Eichmann , R. Dorsinville, and R. R. Alfano, Opt. Lett. 13, (1988). [2] "Parallel ultrafast...optical digital and symbolic computation via optical phase conjugation," Y. Li, G. Eichmann , R. Dorsinville, Appl. Opt. 27, 2025 (1988). [3

Multichannel photonic Hilbert transformers based on complex modulated integrated Bragg gratings.

PubMed

Cheng, Rui; Chrostowski, Lukas

2018-03-01

Multichannel photonic Hilbert transformers (MPHTs) are reported. The devices are based on single compact spiral integrated Bragg gratings on silicon with coupling coefficients precisely modulated by the phase of each grating period. MPHTs with up to nine wavelength channels and a single-channel bandwidth of up to ∼625 GHz are achieved. The potential of the devices for multichannel single-sideband signal generation is suggested. The work offers a new possibility of utilizing wavelength as an extra degree of freedom in designing radio-frequency photonic signal processors. Such multichannel processors are expected to possess improved capacities and a potential to greatly benefit current widespread wavelength division multiplexed systems.
Interdisciplinary education in optics and photonics based on microcontrollers

NASA Astrophysics Data System (ADS)

Dreßler, Paul; Wielage, Heinz-Hermann; Haiss, Ulrich; Vauderwange, Oliver; Curticapean, Dan

2014-07-01

Not only is the number of new devices constantly increasing, but so is their application complexity and power. Most of their applications are in optics, photonics, acoustic and mobile devices. Working speed and functionality is achieved in most of media devices by strategic use of digital signal processors and microcontrollers of the new generation. Considering all these premises of media development dynamics, the authors present how to integrate microcontrollers and digital signal processors in the curricula of media technology lectures by using adequate content. This also includes interdisciplinary content that consists of using the acquired knowledge in media software. These entries offer a deeper understanding of photonics, acoustics and media engineering.
Development of a multikilowatt ion thruster power processor

NASA Technical Reports Server (NTRS)

Schoenfeld, A. D.; Goldin, D. S.; Biess, J. J.

1972-01-01

A feasibility study was made of the application of silicon-controlled, rectifier series, resonant inverter, power conditioning technology to electric propulsion power processing operating from a 200 to 400 Vdc solar array bus. A power system block diagram was generated to meet the electrical requirements of a 20 CM hollow cathode, mercury bombardment, ion engine. The SCR series resonant inverter was developed as a primary means of power switching and conversion, and the analog signal-to-discrete-time-interval converter control system was applied to achieve good regulation. A complete breadboard was designed, fabricated, and tested with a resistive load bank, and critical power processor areas relating to efficiency, weight, and part count were identified.
A sparse matrix algorithm on the Boolean vector machine

NASA Technical Reports Server (NTRS)

Wagner, Robert A.; Patrick, Merrell L.

1988-01-01

VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
An efficient optical architecture for sparsely connected neural networks

NASA Technical Reports Server (NTRS)

Hine, Butler P., III; Downie, John D.; Reid, Max B.

1990-01-01

An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
Event parallelism: Distributed memory parallel computing for high energy physics experiments

NASA Astrophysics Data System (ADS)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Health Monitoring of a Satellite System

NASA Technical Reports Server (NTRS)

Chen, Robert H.; Ng, Hok K.; Speyer, Jason L.; Guntur, Lokeshkumar S.; Carpenter, Russell

2004-01-01

A health monitoring system based on analytical redundancy is developed for satellites on elliptical orbits. First, the dynamics of the satellite including orbital mechanics and attitude dynamics is modelled as a periodic system. Then, periodic fault detection filters are designed to detect and identify the satellite's actuator and sensor faults. In addition, parity equations are constructed using the algebraic redundant relationship among the actuators and sensors. Furthermore, a residual processor is designed to generate the probability of each of the actuator and sensor faults by using a sequential probability test. Finally, the health monitoring system, consisting of periodic fault detection lters, parity equations and residual processor, is evaluated in the simulation in the presence of disturbances and uncertainty.
Superconducting Qubit with Integrated Single Flux Quantum Controller Part I: Theory and Fabrication

NASA Astrophysics Data System (ADS)

Beck, Matthew; Leonard, Edward, Jr.; Thorbeck, Ted; Zhu, Shaojiang; Howington, Caleb; Nelson, Jj; Plourde, Britton; McDermott, Robert

As the size of quantum processors grow, so do the classical control requirements. The single flux quantum (SFQ) Josephson digital logic family offers an attractive route to proximal classical control of multi-qubit processors. Here we describe coherent control of qubits via trains of SFQ pulses. We discuss the fabrication of an SFQ-based pulse generator and a superconducting transmon qubit on a single chip. Sources of excess microwave loss stemming from the complex multilayer fabrication of the SFQ circuit are discussed. We show how to mitigate this loss through judicious choice of process workflow and appropriate use of sacrificial protection layers. Present address: IBM T.J. Watson Research Center.
Performance Qualification Test of the ISS Water Processor Assembly (WPA) Expendables

NASA Technical Reports Server (NTRS)

Carter, Layne; Tabb, David; Tatara, James D.; Mason, Richard K.

2005-01-01

The Water Processor Assembly (WPA) for use on the International Space Station (ISS) includes various technologies for the treatment of waste water. These technologies include filtration, ion exchange, adsorption, catalytic oxidation, and iodination. The WPA hardware implementing portions of these technologies, including the Particulate Filter, Multifiltration Bed, Ion Exchange Bed, and Microbial Check Valve, was recently qualified for chemical performance at the Marshall Space Flight Center. Waste water representing the quality of that produced on the ISS was generated by test subjects and processed by the WPA. Water quality analysis and instrumentation data was acquired throughout the test to monitor hardware performance. This paper documents operation of the test and the assessment of the hardware performance.
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

PubMed Central

2011-01-01

Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

PubMed

Rognes, Torbjørn

2011-06-01

The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.
Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers

NASA Astrophysics Data System (ADS)

Edwards, Mark; Heward, Jeffrey; Clark, C. W.

2009-05-01

At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.
Introduction on performance analysis and profiling methodologies for KVM on ARM virtualization

NASA Astrophysics Data System (ADS)

Motakis, Antonios; Spyridakis, Alexander; Raho, Daniel

2013-05-01

The introduction of hardware virtualization extensions on ARM Cortex-A15 processors has enabled the implementation of full virtualization solutions for this architecture, such as KVM on ARM. This trend motivates the need to quantify and understand the performance impact, emerged by the application of this technology. In this work we start looking into some interesting performance metrics on KVM for ARM processors, which can provide us with useful insight that may lead to potential improvements in the future. This includes measurements such as interrupt latency and guest exit cost, performed on ARM Versatile Express and Samsung Exynos 5250 hardware platforms. Furthermore, we discuss additional methodologies that can provide us with a deeper understanding in the future of the performance footprint of KVM. We identify some of the most interesting approaches in this field, and perform a tentative analysis on how these may be implemented in the KVM on ARM port. These take into consideration hardware and software based counters for profiling, and issues related to the limitations of the simulators which are often used, such as the ARM Fast Models platform.
Method and apparatus for combinatorial logic signal processor in a digitally based high speed x-ray spectrometer

DOEpatents

Warburton, W.K.

1999-02-16

A high speed, digitally based, signal processing system is disclosed which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system lifetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired. 31 figs.
ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Batista, A. J. N.; Sousa, J.; Varandas, C. A. F.

2006-10-15

The efficient vertical stabilization (VS) of plasmas in tokamaks requires a fast reaction of the VS controller, for example, after detection of edge localized modes (ELM). For controlling the effects of very large ELMs a new digital control hardware, based on the Advanced Telecommunications Computing Architecture trade mark sign (ATCA), is being developed aiming to reduce the VS digital control loop cycle (down to an optimal value of 10 {mu}s) and improve the algorithm performance. The system has 1 ATCA trade mark sign processor module and up to 12 ATCA trade mark sign control modules, each one with 32 analogmore » input channels (12 bit resolution), 4 analog output channels (12 bit resolution), and 8 digital input/output channels. The Aurora trade mark sign and PCI Express trade mark sign communication protocols will be used for data transport, between modules, with expected latencies below 2 {mu}s. Control algorithms are implemented on a ix86 based processor with 6 Gflops and on field programmable gate arrays with 80 GMACS, interconnected by serial gigabit links in a full mesh topology.« less
Combining instruction prefetching with partial cache locking to improve WCET in real-time systems.

PubMed

Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

2013-01-01

Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
Combining Instruction Prefetching with Partial Cache Locking to Improve WCET in Real-Time Systems

PubMed Central

Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

2013-01-01

Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking. PMID:24386133
Multi-gigabit optical interconnects for next-generation on-board digital equipment

NASA Astrophysics Data System (ADS)

Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques

2017-11-01

Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multi-gigabit optical interconnects for next-generation on-board digital equipment

NASA Astrophysics Data System (ADS)

Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques

2004-06-01

Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.