general purpose processor: Topics by Science.gov

Sample records for general purpose processor

Methods for Trustworthy Design of On-Chip Bus Interconnect for General-Purpose Processors

DTIC Science & Technology

2012-03-01

Technology Andrew Huang, was able to test the security properties of HyperTransport bus protocol on an Xbox [20]. In his research, he was able to...TRUSTWORTHY DESIGN OF ON -CHIP BUS INTERCONNECT FOR GENERAL-PURPOSE PROCESSORS by Jay F. Elson March 2012 Thesis Advisor: Ted Huffmire Second...AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE Methods for Trustworthy Design of On -Chip Bus Interconnect for General-Purpose Processors 5
Automatic Dynamic Aircraft Modeler (ADAM) for the Computer Program NASTRAN

NASA Technical Reports Server (NTRS)

Griffis, H.

1985-01-01

Large general purpose finite element programs require users to develop large quantities of input data. General purpose pre-processors are used to decrease the effort required to develop structural models. Further reduction of effort can be achieved by specific application pre-processors. Automatic Dynamic Aircraft Modeler (ADAM) is one such application specific pre-processor. General purpose pre-processors use points, lines and surfaces to describe geometric shapes. Specifying that ADAM is used only for aircraft structures allows generic structural sections, wing boxes and bodies, to be pre-defined. Hence with only gross dimensions, thicknesses, material properties and pre-defined boundary conditions a complete model of an aircraft can be created.
Design of RISC Processor Using VHDL and Cadence

NASA Astrophysics Data System (ADS)

Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
An acceleration framework for synthetic aperture radar algorithms

NASA Astrophysics Data System (ADS)

Kim, Youngsoo; Gloster, Clay S.; Alexander, Winser E.

2017-04-01

Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, Field Programmable Gate Arrays (FPGAs) can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration. We used SAR as a case study to illustrate the potential for algorithm acceleration offered by FPGAs. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show a linear speedup by adding reasonably small processing elements in Field Programmable Gate Array (FPGA) as opposed to using a software implementation running on a typical general purpose processor.
A high-speed digital signal processor for atmospheric radar, part 7.3A

NASA Technical Reports Server (NTRS)

Brosnahan, J. W.; Woodard, D. M.

1984-01-01

The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.
Floating-Point Modules Targeted for Use with RC Compilation Tools

NASA Technical Reports Server (NTRS)

Sahin, Ibrahin; Gloster, Clay S.

2000-01-01

Reconfigurable Computing (RC) has emerged as a viable computing solution for computationally intensive applications. Several applications have been mapped to RC system and in most cases, they provided the smallest published execution time. Although RC systems offer significant performance advantages over general-purpose processors, they require more application development time than general-purpose processors. This increased development time of RC systems provides the motivation to develop an optimized module library with an assembly language instruction format interface for use with future RC system that will reduce development time significantly. In this paper, we present area/performance metrics for several different types of floating point (FP) modules that can be utilized to develop complex FP applications. These modules are highly pipelined and optimized for both speed and area. Using these modules, and example application, FP matrix multiplication, is also presented. Our results and experiences show, that with these modules, 8-10X speedup over general-purpose processors can be achieved.
50 CFR 680.44 - Cost recovery.

Code of Federal Regulations, 2014 CFR

2014-10-01

... value determined for at-sea Catcher/Processors (CP), depending on their activity. Ex-vessel value...-vessel value—(i) General. Catcher/processors must use the corresponding CP standard price(s) for the purposes of calculating fee liability. (ii) CP standard prices. As part of the summary described in...
50 CFR 680.44 - Cost recovery.

Code of Federal Regulations, 2010 CFR

2010-10-01

... value determined for at-sea Catcher/Processors (CP), depending on their activity. Ex-vessel value...-vessel value—(i) General. Catcher/processors must use the corresponding CP standard price(s) for the purposes of calculating fee liability. (ii) CP standard prices. As part of the summary described in...
50 CFR 680.44 - Cost recovery.

Code of Federal Regulations, 2012 CFR

2012-10-01

... value determined for at-sea Catcher/Processors (CP), depending on their activity. Ex-vessel value...-vessel value—(i) General. Catcher/processors must use the corresponding CP standard price(s) for the purposes of calculating fee liability. (ii) CP standard prices. As part of the summary described in...
50 CFR 680.44 - Cost recovery.

Code of Federal Regulations, 2011 CFR

2011-10-01

... value determined for at-sea Catcher/Processors (CP), depending on their activity. Ex-vessel value...-vessel value—(i) General. Catcher/processors must use the corresponding CP standard price(s) for the purposes of calculating fee liability. (ii) CP standard prices. As part of the summary described in...
50 CFR 680.44 - Cost recovery.

Code of Federal Regulations, 2013 CFR

2013-10-01

... value determined for at-sea Catcher/Processors (CP), depending on their activity. Ex-vessel value...-vessel value—(i) General. Catcher/processors must use the corresponding CP standard price(s) for the purposes of calculating fee liability. (ii) CP standard prices. As part of the summary described in...
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Implementation of kernels on the Maestro processor

NASA Astrophysics Data System (ADS)

Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.

Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
A fully reconfigurable photonic integrated signal processor

NASA Astrophysics Data System (ADS)

Liu, Weilin; Li, Ming; Guzzon, Robert S.; Norberg, Erik J.; Parker, John S.; Lu, Mingzhi; Coldren, Larry A.; Yao, Jianping

2016-03-01

Photonic signal processing has been considered a solution to overcome the inherent electronic speed limitations. Over the past few years, an impressive range of photonic integrated signal processors have been proposed, but they usually offer limited reconfigurability, a feature highly needed for the implementation of large-scale general-purpose photonic signal processors. Here, we report and experimentally demonstrate a fully reconfigurable photonic integrated signal processor based on an InP-InGaAsP material system. The proposed photonic signal processor is capable of performing reconfigurable signal processing functions including temporal integration, temporal differentiation and Hilbert transformation. The reconfigurability is achieved by controlling the injection currents to the active components of the signal processor. Our demonstration suggests great potential for chip-scale fully programmable all-optical signal processing.
Stanford Hardware Development Program

NASA Technical Reports Server (NTRS)

Peterson, A.; Linscott, I.; Burr, J.

1986-01-01

Architectures for high performance, digital signal processing, particularly for high resolution, wide band spectrum analysis were developed. These developments are intended to provide instrumentation for NASA's Search for Extraterrestrial Intelligence (SETI) program. The real time signal processing is both formal and experimental. The efficient organization and optimal scheduling of signal processing algorithms were investigated. The work is complemented by efforts in processor architecture design and implementation. A high resolution, multichannel spectrometer that incorporates special purpose microcoded signal processors is being tested. A general purpose signal processor for the data from the multichannel spectrometer was designed to function as the processing element in a highly concurrent machine. The processor performance required for the spectrometer is in the range of 1000 to 10,000 million instructions per second (MIPS). Multiple node processor configurations, where each node performs at 100 MIPS, are sought. The nodes are microprogrammable and are interconnected through a network with high bandwidth for neighboring nodes, and medium bandwidth for nodes at larger distance. The implementation of both the current mutlichannel spectrometer and the signal processor as Very Large Scale Integration CMOS chip sets was commenced.
Concept of a programmable maintenance processor applicable to multiprocessing systems

NASA Technical Reports Server (NTRS)

Glover, Richard D.

1988-01-01

A programmable maintenance processor concept applicable to multiprocessing systems has been developed at the NASA Ames Research Center's Dryden Flight Research Facility. This stand-alone-processor is intended to provide support for system and application software testing as well as hardware diagnostics. An initial machanization has been incorporated into the extended aircraft interrogation and display system (XAIDS) which is multiprocessing general-purpose ground support equipment. The XAIDS maintenance processor has independent terminal and printer interfaces and a dedicated magnetic bubble memory that stores system test sequences entered from the terminal. This report describes the hardware and software embodied in this processor and shows a typical application in the check-out of a new XAIDS.
FANTOM: Algorithm-Architecture Codesign for High-Performance Embedded Signal and Image Processing Systems

DTIC Science & Technology

2013-05-25

graphics processors by IBM, AMD, and nVIDIA . They are between general-purpose pro- cessors and special-purpose processors. In Phase II. 3.10 Measure of...particular, Dr. Kevin Irick started a company Silicon Scapes and he has been the CEO. 5 Implications for Related/Future Research We speculate that...final project report in Jan. 2011. At the test and validation stage of the project. FANTOM’s partner at Raytheon quit from his company and hence from
Study of a hybrid multispectral processor

NASA Technical Reports Server (NTRS)

Marshall, R. E.; Kriegler, F. J.

1973-01-01

A hybrid processor is described offering enough handling capacity and speed to process efficiently the large quantities of multispectral data that can be gathered by scanner systems such as MSDS, SKYLAB, ERTS, and ERIM M-7. Combinations of general-purpose and special-purpose hybrid computers were examined to include both analog and digital types as well as all-digital configurations. The current trend toward lower costs for medium-scale digital circuitry suggests that the all-digital approach may offer the better solution within the time frame of the next few years. The study recommends and defines such a hybrid digital computing system in which both special-purpose and general-purpose digital computers would be employed. The tasks of recognizing surface objects would be performed in a parallel, pipeline digital system while the tasks of control and monitoring would be handled by a medium-scale minicomputer system. A program to design and construct a small, prototype, all-digital system has been started.
System support software for the Space Ultrareliable Modular Computer (SUMC)

NASA Technical Reports Server (NTRS)

Hill, T. E.; Hintze, G. C.; Hodges, B. C.; Austin, F. A.; Buckles, B. P.; Curran, R. T.; Lackey, J. D.; Payne, R. E.

1974-01-01

The highly transportable programming system designed and implemented to support the development of software for the Space Ultrareliable Modular Computer (SUMC) is described. The SUMC system support software consists of program modules called processors. The initial set of processors consists of the supervisor, the general purpose assembler for SUMC instruction and microcode input, linkage editors, an instruction level simulator, a microcode grid print processor, and user oriented utility programs. A FORTRAN 4 compiler is undergoing development. The design facilitates the addition of new processors with a minimum effort and provides the user quasi host independence on the ground based operational software development computer. Additional capability is provided to accommodate variations in the SUMC architecture without consequent major modifications in the initial processors.

New Dimensions in Microarchitecture Harnessing 3D Integration Technologies (BRIEFING CHARTS)

DTIC Science & Technology

2007-03-06

Quad Core Bandwidth and Latency Boundaries General Purpose Processor Loads Latency limited Ba nd w id th li m ite dProcessor load trade -off between I...delay No= number of ckts at 1V do= ckt delay at 1V From “3D Intergration ” Special Topic Sessionl W. Haensch, ISSCC ‘07, 2/07 11 DARPA MTS March 6, 2007
Development of a General-Purpose Analysis System Based on a Programmable Fluid Processor Final Report CRADA No. TC-2027-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

McConaghy, C. F.; Gascoyne, P. R.

The purpose ofthis project was to develop a general-purpose analysis system based on a programmable fluid processor (PFP). The PFP is an array of electrodes surrounded by fluid reservoirs and injectors. Injected droplets of various reagents are manjpulated and combined on the array by Dielectrophoretic (DEP) forces. The goal was to create a small handheld device that could accomplish the tasks currently undertaken by much larger, time consuming, manual manipulation in the lab. The entire effo1t was funded by DARPA under the Bio-Flips program. MD Anderson Cancer Center was the PI for the DARPA effort. The Bio-Flips program was amore » 3- year program that ran from September 2000 to September 2003. The CRADA was somewhat behind the Bi-Flips program running from June 2001 to June 2004 with a no cost extension to September 2004.« less
40 CFR 238.10 - Purpose and applicability.

Code of Federal Regulations, 2011 CFR

2011-07-01

... DEGRADABLE PLASTIC RING CARRIERS General Provisions § 238.10 Purpose and applicability. The purpose of this part is to require that plastic ring carriers be made of degradable materials as described in §§ 238.20 and 238.30. The requirements of this part apply to all processors and importers of plastic ring...
40 CFR 238.10 - Purpose and applicability.

Code of Federal Regulations, 2010 CFR

2010-07-01

... DEGRADABLE PLASTIC RING CARRIERS General Provisions § 238.10 Purpose and applicability. The purpose of this part is to require that plastic ring carriers be made of degradable materials as described in §§ 238.20 and 238.30. The requirements of this part apply to all processors and importers of plastic ring...
40 CFR 238.10 - Purpose and applicability.

Code of Federal Regulations, 2012 CFR

2012-07-01

... DEGRADABLE PLASTIC RING CARRIERS General Provisions § 238.10 Purpose and applicability. The purpose of this part is to require that plastic ring carriers be made of degradable materials as described in §§ 238.20 and 238.30. The requirements of this part apply to all processors and importers of plastic ring...
40 CFR 238.10 - Purpose and applicability.

Code of Federal Regulations, 2013 CFR

2013-07-01

... DEGRADABLE PLASTIC RING CARRIERS General Provisions § 238.10 Purpose and applicability. The purpose of this part is to require that plastic ring carriers be made of degradable materials as described in §§ 238.20 and 238.30. The requirements of this part apply to all processors and importers of plastic ring...
40 CFR 238.10 - Purpose and applicability.

Code of Federal Regulations, 2014 CFR

2014-07-01

... DEGRADABLE PLASTIC RING CARRIERS General Provisions § 238.10 Purpose and applicability. The purpose of this part is to require that plastic ring carriers be made of degradable materials as described in §§ 238.20 and 238.30. The requirements of this part apply to all processors and importers of plastic ring...
Synthetic Aperture Radar (SAR) data processing

NASA Technical Reports Server (NTRS)

Beckner, F. L.; Ahr, H. A.; Ausherman, D. A.; Cutrona, L. J.; Francisco, S.; Harrison, R. E.; Heuser, J. S.; Jordan, R. L.; Justus, J.; Manning, B.

1978-01-01

The available and optimal methods for generating SAR imagery for NASA applications were identified. The SAR image quality and data processing requirements associated with these applications were studied. Mathematical operations and algorithms required to process sensor data into SAR imagery were defined. The architecture of SAR image formation processors was discussed, and technology necessary to implement the SAR data processors used in both general purpose and dedicated imaging systems was addressed.
Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

NASA Astrophysics Data System (ADS)

Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
SIG: a general-purpose signal processing program. User's manual. Revision 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lager, D.; Azevedo, S.

1985-05-09

SIG is a general-purpose signal processing, analysis, and display program. Its main purpose is to perform manipulations on time-domain and frequenccy-domain signals. The manual contains a complete description of the SIG program from the user's stand-point. A brief exercise in using SIG is shown. Complete descriptions are given of each command in the SIG core. General information about the SIG structure, command processor, and graphics options are provided. An example usage of SIG for solving a problem is developed, and error message formats are briefly discussed. (LEW)
Replication of Space-Shuttle Computers in FPGAs and ASICs

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.

2008-01-01

A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Single-Scale Retinex Using Digital Signal Processors

NASA Technical Reports Server (NTRS)

Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn

2005-01-01

The Retinex is an image enhancement algorithm that improves the brightness, contrast and sharpness of an image. It performs a non-linear spatial/spectral transform that provides simultaneous dynamic range compression and color constancy. It has been used for a wide variety of applications ranging from aviation safety to general purpose photography. Many potential applications require the use of Retinex processing at video frame rates. This is difficult to achieve with general purpose processors because the algorithm contains a large number of complex computations and data transfers. In addition, many of these applications also constrain the potential architectures to embedded processors to save power, weight and cost. Thus we have focused on digital signal processors (DSPs) and field programmable gate arrays (FPGAs) as potential solutions for real-time Retinex processing. In previous efforts we attained a 21 (full) frame per second (fps) processing rate for the single-scale monochromatic Retinex with a TMS320C6711 DSP operating at 150 MHz. This was achieved after several significant code improvements and optimizations. Since then we have migrated our design to the slightly more powerful TMS320C6713 DSP and the fixed point TMS320DM642 DSP. In this paper we briefly discuss the Retinex algorithm, the performance of the algorithm executing on the TMS320C6713 and the TMS320DM642, and compare the results with the TMS320C6711.
A more general system for Poisson series manipulation.

NASA Technical Reports Server (NTRS)

Cherniack, J. R.

1973-01-01

The design of a working Poisson series processor system is described that is more general than those currently in use. This system is the result of a series of compromises among efficiency, generality, ease of programing, and ease of use. The most general form of coefficients that can be multiplied efficiently is pointed out, and the place of general-purpose algebraic systems in celestial mechanics is discussed.
Real-time software receiver

NASA Technical Reports Server (NTRS)

Psiaki, Mark L. (Inventor); Kintner, Jr., Paul M. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor)

2007-01-01

A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.
Real-time software receiver

NASA Technical Reports Server (NTRS)

Psiaki, Mark L. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor); Kintner, Jr., Paul M. (Inventor)

2006-01-01

A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.
Documentary table-top view of a comparison of the General Purpose Computers.

NASA Image and Video Library

1988-09-13

S88-47513 (Aug 1988) --- The current and future versions of general purpose computers for Space Shuttle orbiters are represented in this frame. The two boxes on the left (AP101B) represent the current GPC configuration, with the input-output processor at far left and the central processing unit at its side. The upgraded version combines both elements in a single unit (far right, AP101S).
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Towards the formal verification of the requirements and design of a processor interface unit

NASA Technical Reports Server (NTRS)

Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.

1993-01-01

The formal verification of the design and partial requirements for a Processor Interface Unit (PIU) using the Higher Order Logic (HOL) theorem-proving system is described. The processor interface unit is a single-chip subsystem within a fault-tolerant embedded system under development within the Boeing Defense and Space Group. It provides the opportunity to investigate the specification and verification of a real-world subsystem within a commercially-developed fault-tolerant computer. An overview of the PIU verification effort is given. The actual HOL listing from the verification effort are documented in a companion NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit - HOL Listings' including the general-purpose HOL theories and definitions that support the PIU verification as well as tactics used in the proofs.
Stream Processors

NASA Astrophysics Data System (ADS)

Erez, Mattan; Dally, William J.

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

General purpose molecular dynamics simulations fully implemented on graphics processing units

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.

2008-05-01

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
FPGA based control system for space instrumentation

NASA Astrophysics Data System (ADS)

Di Giorgio, Anna M.; Cerulli Irelli, Pasquale; Nuzzolo, Francesco; Orfei, Renato; Spinoglio, Luigi; Liu, Giovanni S.; Saraceno, Paolo

2008-07-01

The prototype for a general purpose FPGA based control system for space instrumentation is presented, with particular attention to the instrument control application software. The system HW is based on the LEON3FT processor, which gives the flexibility to configure the chip with only the necessary HW functionalities, from simple logic up to small dedicated processors. The instrument control SW is developed in ANSI C and for time critical (<10μs) commanding sequences implements an internal instructions sequencer, triggered via an interrupt service routine based on a HW high priority interrupt.
Dynamically allocating sets of fine-grained processors to running computations

NASA Technical Reports Server (NTRS)

Middleton, David

1988-01-01

Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined.
A novel VLSI processor architecture for supercomputing arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

1993-01-01

Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

PubMed

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

PubMed Central

Cheung, Kit; Schultz, Simon R.; Luk, Wayne

2016-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
Parallel computing on Unix workstation arrays

NASA Astrophysics Data System (ADS)

Reale, F.; Bocchino, F.; Sciortino, S.

1994-12-01

We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
GPU: the biggest key processor for AI and parallel processing

NASA Astrophysics Data System (ADS)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Programming methodology for a general purpose automation controller

NASA Technical Reports Server (NTRS)

Sturzenbecker, M. C.; Korein, J. U.; Taylor, R. H.

1987-01-01

The General Purpose Automation Controller is a multi-processor architecture for automation programming. A methodology has been developed whose aim is to simplify the task of programming distributed real-time systems for users in research or manufacturing. Programs are built by configuring function blocks (low-level computations) into processes using data flow principles. These processes are activated through the verb mechanism. Verbs are divided into two classes: those which support devices, such as robot joint servos, and those which perform actions on devices, such as motion control. This programming methodology was developed in order to achieve the following goals: (1) specifications for real-time programs which are to a high degree independent of hardware considerations such as processor, bus, and interconnect technology; (2) a component approach to software, so that software required to support new devices and technologies can be integrated by reconfiguring existing building blocks; (3) resistance to error and ease of debugging; and (4) a powerful command language interface.
Missile signal processing common computer architecture for rapid technology upgrade

NASA Astrophysics Data System (ADS)

Rabinkin, Daniel V.; Rutledge, Edward; Monticciolo, Paul

2004-10-01

Interceptor missiles process IR images to locate an intended target and guide the interceptor towards it. Signal processing requirements have increased as the sensor bandwidth increases and interceptors operate against more sophisticated targets. A typical interceptor signal processing chain is comprised of two parts. Front-end video processing operates on all pixels of the image and performs such operations as non-uniformity correction (NUC), image stabilization, frame integration and detection. Back-end target processing, which tracks and classifies targets detected in the image, performs such algorithms as Kalman tracking, spectral feature extraction and target discrimination. In the past, video processing was implemented using ASIC components or FPGAs because computation requirements exceeded the throughput of general-purpose processors. Target processing was performed using hybrid architectures that included ASICs, DSPs and general-purpose processors. The resulting systems tended to be function-specific, and required custom software development. They were developed using non-integrated toolsets and test equipment was developed along with the processor platform. The lifespan of a system utilizing the signal processing platform often spans decades, while the specialized nature of processor hardware and software makes it difficult and costly to upgrade. As a result, the signal processing systems often run on outdated technology, algorithms are difficult to update, and system effectiveness is impaired by the inability to rapidly respond to new threats. A new design approach is made possible three developments; Moore's Law - driven improvement in computational throughput; a newly introduced vector computing capability in general purpose processors; and a modern set of open interface software standards. Today's multiprocessor commercial-off-the-shelf (COTS) platforms have sufficient throughput to support interceptor signal processing requirements. This application may be programmed under existing real-time operating systems using parallel processing software libraries, resulting in highly portable code that can be rapidly migrated to new platforms as processor technology evolves. Use of standardized development tools and 3rd party software upgrades are enabled as well as rapid upgrade of processing components as improved algorithms are developed. The resulting weapon system will have a superior processing capability over a custom approach at the time of deployment as a result of a shorter development cycles and use of newer technology. The signal processing computer may be upgraded over the lifecycle of the weapon system, and can migrate between weapon system variants enabled by modification simplicity. This paper presents a reference design using the new approach that utilizes an Altivec PowerPC parallel COTS platform. It uses a VxWorks-based real-time operating system (RTOS), and application code developed using an efficient parallel vector library (PVL). A quantification of computing requirements and demonstration of interceptor algorithm operating on this real-time platform are provided.
Context as Support for Learning Computer Organization

ERIC Educational Resources Information Center

Tew, Allison Elliott; Dorn, Brian; Leahy, William D., Jr.; Guzdial, Mark

2008-01-01

The ubiquity of personal computational devices in the lives of today's students presents a meaningful context for courses in computer organization beyond the general-purpose or imaginary processors routinely used. This article presents results of a comparative study examining student performance in a conventional organization course and in one…
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

PubMed

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Green Secure Processors: Towards Power-Efficient Secure Processor Design

NASA Astrophysics Data System (ADS)

Chhabra, Siddhartha; Solihin, Yan

With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
FPGA wavelet processor design using language for instruction-set architectures (LISA)

NASA Astrophysics Data System (ADS)

Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

2007-04-01

The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
Fault-Tolerant Software-Defined Radio on Manycore

NASA Technical Reports Server (NTRS)

Ricketts, Scott

2015-01-01

Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Bringing MapReduce Closer To Data With Active Drives

NASA Astrophysics Data System (ADS)

Golpayegani, N.; Prathapan, S.; Warmka, R.; Wyatt, B.; Halem, M.; Trantham, J. D.; Markey, C. A.

2017-12-01

Moving computation closer to the data location has been a much theorized improvement to computation for decades. The increase in processor performance, the decrease in processor size and power requirement combined with the increase in data intensive computing has created a push to move computation as close to data as possible. We will show the next logical step in this evolution in computing: moving computation directly to storage. Hypothetical systems, known as Active Drives, have been proposed as early as 1998. These Active Drives would have a general-purpose CPU on each disk allowing for computations to be performed on them without the need to transfer the data to the computer over the system bus or via a network. We will utilize Seagate's Active Drives to perform general purpose parallel computing using the MapReduce programming model directly on each drive. We will detail how the MapReduce programming model can be adapted to the Active Drive compute model to perform general purpose computing with comparable results to traditional MapReduce computations performed via Hadoop. We will show how an Active Drive based approach significantly reduces the amount of data leaving the drive when performing several common algorithms: subsetting and gridding. We will show that an Active Drive based design significantly improves data transfer speeds into and out of drives compared to Hadoop's HDFS while at the same time keeping comparable compute speeds as Hadoop.
Respecting Relations: Memory Access and Antecedent Retrieval in Incremental Sentence Processing

ERIC Educational Resources Information Center

Kush, Dave W.

2013-01-01

This dissertation uses the processing of anaphoric relations to probe how linguistic information is encoded in and retrieved from memory during real-time sentence comprehension. More specifically, the dissertation attempts to resolve a tension between the demands of a linguistic processor implemented in a general-purpose cognitive architecture and…
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

PubMed

Kim, Lok-Won

2018-05-01

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).
Accurate and efficient integration for molecular dynamics simulations at constant temperature and pressure

NASA Astrophysics Data System (ADS)

Lippert, Ross A.; Predescu, Cristian; Ierardi, Douglas J.; Mackenzie, Kenneth M.; Eastwood, Michael P.; Dror, Ron O.; Shaw, David E.

2013-10-01

In molecular dynamics simulations, control over temperature and pressure is typically achieved by augmenting the original system with additional dynamical variables to create a thermostat and a barostat, respectively. These variables generally evolve on timescales much longer than those of particle motion, but typical integrator implementations update the additional variables along with the particle positions and momenta at each time step. We present a framework that replaces the traditional integration procedure with separate barostat, thermostat, and Newtonian particle motion updates, allowing thermostat and barostat updates to be applied infrequently. Such infrequent updates provide a particularly substantial performance advantage for simulations parallelized across many computer processors, because thermostat and barostat updates typically require communication among all processors. Infrequent updates can also improve accuracy by alleviating certain sources of error associated with limited-precision arithmetic. In addition, separating the barostat, thermostat, and particle motion update steps reduces certain truncation errors, bringing the time-average pressure closer to its target value. Finally, this framework, which we have implemented on both general-purpose and special-purpose hardware, reduces software complexity and improves software modularity.
The parallel algorithm for the 2D discrete wavelet transform

NASA Astrophysics Data System (ADS)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.

a Real-Time Computer Music Synthesis System

NASA Astrophysics Data System (ADS)

Lent, Keith Henry

A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray W

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.; Goodman, Joseph W.

1989-01-01

The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
An efficient optical architecture for sparsely connected neural networks

NASA Technical Reports Server (NTRS)

Hine, Butler P., III; Downie, John D.; Reid, Max B.

1990-01-01

An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
SHRIF, a General-Purpose System for Heuristic Retrieval of Information and Facts, Applied to Medical Knowledge Processing.

ERIC Educational Resources Information Center

Findler, Nicholas V.; And Others

1992-01-01

Describes SHRIF, a System for Heuristic Retrieval of Information and Facts, and the medical knowledge base that was used in its development. Highlights include design decisions; the user-machine interface, including the language processor; and the organization of the knowledge base in an artificial intelligence (AI) project like this one. (57…
Multivariate interactive digital analysis system /MIDAS/ - A new fast multispectral recognition system

NASA Technical Reports Server (NTRS)

Kriegler, F.; Marshall, R.; Lampert, S.; Gordon, M.; Cornell, C.; Kistler, R.

1973-01-01

The MIDAS system is a prototype, multiple-pipeline digital processor mechanizing the multivariate-Gaussian, maximum-likelihood decision algorithm operating at 200,000 pixels/second. It incorporates displays and film printer equipment under control of a general purpose midi-computer and possesses sufficient flexibility that operational versions of the equipment may be subsequently specified as subsets of the system.
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics

NASA Technical Reports Server (NTRS)

Watson, Willie R.; Storaasli, Olaf O.

2004-01-01

Of several iterative and direct equation solvers evaluated previously for computations in aeroacoustics, the most promising was the NASA-developed General-Purpose Solver (winner of NASA's 1999 software of the year award). This paper presents detailed, single-processor statistics of the performance of this solver, which has been tailored and optimized for large-scale aeroacoustic computations. The statistics, compiled using an SGI ORIGIN 2000 computer with 12 Gb available memory (RAM) and eight available processors, are the central processing unit time, RAM requirements, and solution error. The equation solver is capable of solving 10 thousand complex unknowns in as little as 0.01 sec using 0.02 Gb RAM, and 8.4 million complex unknowns in slightly less than 3 hours using all 12 Gb. This latter solution is the largest aeroacoustics problem solved to date with this technique. The study was unable to detect any noticeable error in the solution, since noise levels predicted from these solution vectors are in excellent agreement with the noise levels computed from the exact solution. The equation solver provides a means for obtaining numerical solutions to aeroacoustics problems in three dimensions.
FPGA-Based, Self-Checking, Fault-Tolerant Computers

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2004-01-01

A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
Coding, testing and documentation of processors for the flight design system

NASA Technical Reports Server (NTRS)

1980-01-01

The general functional design and implementation of processors for a space flight design system are briefly described. Discussions of a basetime initialization processor; conic, analytical, and precision coasting flight processors; and an orbit lifetime processor are included. The functions of several utility routines are also discussed.
Towards the formal specification of the requirements and design of a processor interface unit: HOL listings

NASA Technical Reports Server (NTRS)

Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.

1993-01-01

This technical report contains the HOL listings of the specification of the design and major portions of the requirements for a commercially developed processor interface unit (or PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU specification as it currently exists. Section two of this report contains general-purpose HOL theories that support the PIU specification. These theories include definitions for the hardware components used in the PIU, our implementation of bit words, and our implementation of temporal logic. Section three contains the HOL listings for the PIU design specification. Aside from the PIU internal bus (I-Bus), this specification is complete. Section four contains the HOL listings for a major portion of the PIU requirements specification. Specifically, it contains most of the definition for the PIU behavior associated with memory accesses initiated by the local processor.
Towards the formal verification of the requirements and design of a processor interface unit: HOL listings

NASA Technical Reports Server (NTRS)

Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.

1993-01-01

This technical report contains the Higher-Order Logic (HOL) listings of the partial verification of the requirements and design for a commercially developed processor interface unit (PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault tolerant computer system. This system, the Fault Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU verification as it currently exists. Section two of this report contains general-purpose HOL theories and definitions that support the PIU verification. These include arithmetic theories dealing with inequalities and associativity, and a collection of tactics used in the PIU proofs. Section three contains the HOL listings for the completed PIU design verification. Section 4 contains the HOL listings for the partial requirements verification of the P-Port.
Integrated 3-D vision system for autonomous vehicles

NASA Astrophysics Data System (ADS)

Hou, Kun M.; Shawky, Mohamed; Tu, Xiaowei

1992-03-01

Nowadays, autonomous vehicles have become a multidiscipline field. Its evolution is taking advantage of the recent technological progress in computer architectures. As the development tools became more sophisticated, the trend is being more specialized, or even dedicated architectures. In this paper, we will focus our interest on a parallel vision subsystem integrated in the overall system architecture. The system modules work in parallel, communicating through a hierarchical blackboard, an extension of the 'tuple space' from LINDA concepts, where they may exchange data or synchronization messages. The general purpose processing elements are of different skills, built around 40 MHz i860 Intel RISC processors for high level processing and pipelined systolic array processors based on PLAs or FPGAs for low-level processing.
General purpose pulse shape analysis for fast scintillators implemented in digital readout electronics

NASA Astrophysics Data System (ADS)

Asztalos, Stephen J.; Hennig, Wolfgang; Warburton, William K.

2016-01-01

Pulse shape discrimination applied to certain fast scintillators is usually performed offline. In sufficiently high-event rate environments data transfer and storage become problematic, which suggests a different analysis approach. In response, we have implemented a general purpose pulse shape analysis algorithm in the XIA Pixie-500 and Pixie-500 Express digital spectrometers. In this implementation waveforms are processed in real time, reducing the pulse characteristics to a few pulse shape analysis parameters and eliminating time-consuming waveform transfer and storage. We discuss implementation of these features, their advantages, necessary trade-offs and performance. Measurements from bench top and experimental setups using fast scintillators and XIA processors are presented.
40 CFR 747.195 - Triethanolamine salt of a substituted organic acid.

Code of Federal Regulations, 2010 CFR

2010-07-01

..., commerce, importer, impurity, Inventory, manufacturer, person, process, processor, and small quantities... control of the processor. (ii) Distribution in commerce is limited to purposes of export. (iii) The processor or distributor may not use the substance except in small quantities solely for research and...
Digital receiver study and implementation

NASA Technical Reports Server (NTRS)

Fogle, D. A.; Lee, G. M.; Massey, J. C.

1972-01-01

Computer software was developed which makes it possible to use any general purpose computer with A/D conversion capability as a PSK receiver for low data rate telemetry processing. Carrier tracking, bit synchronization, and matched filter detection are all performed digitally. To aid in the implementation of optimum computer processors, a study of general digital processing techniques was performed which emphasized various techniques for digitizing general analog systems. In particular, the phase-locked loop was extensively analyzed as a typical non-linear communication element. Bayesian estimation techniques for PSK demodulation were studied. A hardware implementation of the digital Costas loop was developed.
An evaluation of the directed flow graph methodology

NASA Technical Reports Server (NTRS)

Snyder, W. E.; Rajala, S. A.

1984-01-01

The applicability of the Directed Graph Methodology (DGM) to the design and analysis of special purpose image and signal processing hardware was evaluated. A special purpose image processing system was designed and described using DGM. The design, suitable for very large scale integration (VLSI) implements a region labeling technique. Two computer chips were designed, both using metal-nitride-oxide-silicon (MNOS) technology, as well as a functional system utilizing those chips to perform real time region labeling. The system is described in terms of DGM primitives. As it is currently implemented, DGM is inappropriate for describing synchronous, tightly coupled, special purpose systems. The nature of the DGM formalism lends itself more readily to modeling networks of general purpose processors.
General optical discrete z transform: design and application.

PubMed

Ngo, Nam Quoc

2016-12-20

This paper presents a generalization of the discrete z transform algorithm. It is shown that the GOD-ZT algorithm is a generalization of several important conventional discrete transforms. Based on the GOD-ZT algorithm, a tunable general optical discrete z transform (GOD-ZT) processor is synthesized using the silica-based finite impulse response transversal filter. To demonstrate the effectiveness of the method, the design and simulation of a tunable optical discrete Fourier transform (ODFT) processor as a special case of the synthesized GOD-ZT processor is presented. It is also shown that the ODFT processor can function as a real-time optical spectrum analyzer. The tunable ODFT has an important potential application as a tunable optical demultiplexer at the receiver end of an optical orthogonal frequency-division multiplexing transmission system.
The control data "GIRAFFE" system for interactive graphic finite element analysis

NASA Technical Reports Server (NTRS)

Park, S.; Brandon, D. M., Jr.

1975-01-01

The Graphical Interface for Finite Elements (GIRAFFE) general purpose interactive graphics application package was described. This system may be used as a pre/post processor for structural analysis computer programs. It facilitates the operations of creating, editing, or reviewing all the structural input/output data on a graphics terminal in a time-sharing mode of operation. An application program for a simple three-dimensional plate problem was illustrated.
Achieving supercomputer performance for neural net simulation with an array of digital signal processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Muller, U.A.; Baumle, B.; Kohler, P.

1992-10-01

Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

PubMed

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

2014-03-11

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.

A generic multibody simulation

NASA Technical Reports Server (NTRS)

Hopping, K. A.; Kohn, W.

1986-01-01

Described is a dynamic simulation package which can be configured for orbital test scenarios involving multiple bodies. The rotational and translational state integration methods are selectable for each individual body and may be changed during a run if necessary. Characteristics of the bodies are determined by assigning components consisting of mass properties, forces, and moments, which are the outputs of user-defined environmental models. Generic model implementation is facilitated by a transformation processor which performs coordinate frame inversions. Transformations are defined in the initialization file as part of the simulation configuration. The simulation package includes an initialization processor, which consists of a command line preprocessor, a general purpose grammar, and a syntax scanner. These permit specifications of the bodies, their interrelationships, and their initial states in a format that is not dependent on a particular test scenario.
Processor-in-memory-and-storage architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeBenedictis, Erik

A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code wordmore » is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.« less
A complexity-scalable software-based MPEG-2 video encoder.

PubMed

Chen, Guo-bin; Lu, Xin-ning; Wang, Xing-guo; Liu, Ji-lin

2004-05-01

With the development of general-purpose processors (GPP) and video signal processing algorithms, it is possible to implement a software-based real-time video encoder on GPP, and its low cost and easy upgrade attract developers' interests to transfer video encoding from specialized hardware to more flexible software. In this paper, the encoding structure is set up first to support complexity scalability; then a lot of high performance algorithms are used on the key time-consuming modules in coding process; finally, at programming level, processor characteristics are considered to improve data access efficiency and processing parallelism. Other programming methods such as lookup table are adopted to reduce the computational complexity. Simulation results showed that these ideas could not only improve the global performance of video coding, but also provide great flexibility in complexity regulation.
The development of a general purpose ARM-based processing unit for the ATLAS TileCal sROD

NASA Astrophysics Data System (ADS)

Cox, M. A.; Reed, R.; Mellado, B.

2015-01-01

After Phase-II upgrades in 2022, the data output from the LHC ATLAS Tile Calorimeter will increase significantly. ARM processors are common in mobile devices due to their low cost, low energy consumption and high performance. It is proposed that a cost-effective, high data throughput Processing Unit (PU) can be developed by using several consumer ARM processors in a cluster configuration to allow aggregated processing performance and data throughput while maintaining minimal software design difficulty for the end-user. This PU could be used for a variety of high-level functions on the high-throughput raw data such as spectral analysis and histograms to detect possible issues in the detector at a low level. High-throughput I/O interfaces are not typical in consumer ARM System on Chips but high data throughput capabilities are feasible via the novel use of PCI-Express as the I/O interface to the ARM processors. An overview of the PU is given and the results for performance and throughput testing of four different ARM Cortex System on Chips are presented.
A Programming Framework for Scientific Applications on CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less
Transputer parallel processing at NASA Lewis Research Center

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1989-01-01

The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
Design of the SLAC RCE Platform: A General Purpose ATCA Based Data Acquisition System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herbst, R.; Claus, R.; Freytag, M.

2015-01-23

The SLAC RCE platform is a general purpose clustered data acquisition system implemented on a custom ATCA compliant blade, called the Cluster On Board (COB). The core of the system is the Reconfigurable Cluster Element (RCE), which is a system-on-chip design based upon the Xilinx Zynq family of FPGAs, mounted on custom COB daughter-boards. The Zynq architecture couples a dual core ARM Cortex A9 based processor with a high performance 28nm FPGA. The RCE has 12 external general purpose bi-directional high speed links, each supporting serial rates of up to 12Gbps. 8 RCE nodes are included on a COB, eachmore » with a 10Gbps connection to an on-board 24-port Ethernet switch integrated circuit. The COB is designed to be used with a standard full-mesh ATCA backplane allowing multiple RCE nodes to be tightly interconnected with minimal interconnect latency. Multiple shelves can be clustered using the front panel 10-gbps connections. The COB also supports local and inter-blade timing and trigger distribution. An experiment specific Rear Transition Module adapts the 96 high speed serial links to specific experiments and allows an experiment-specific timing and busy feedback connection. This coupling of processors with a high performance FPGA fabric in a low latency, multiple node cluster allows high speed data processing that can be easily adapted to any physics experiment. RTEMS and Linux are both ported to the module. The RCE has been used or is the baseline for several current and proposed experiments (LCLS, HPS, LSST, ATLAS-CSC, LBNE, DarkSide, ILC-SiD, etc).« less
Energy consumption estimation of an OMAP-based Android operating system

NASA Astrophysics Data System (ADS)

González, Gabriel; Juárez, Eduardo; Castro, Juan José; Sanz, César

2011-05-01

System-level energy optimization of battery-powered multimedia embedded systems has recently become a design goal. The poor operational time of multimedia terminals makes computationally demanding applications impractical in real scenarios. For instance, the so-called smart-phones are currently unable to remain in operation longer than several hours. The OMAP3530 processor basically consists of two processing cores, a General Purpose Processor (GPP) and a Digital Signal Processor (DSP). The former, an ARM Cortex-A8 processor, is aimed to run a generic Operating System (OS) while the latter, a DSP core based on the C64x+, has architecture optimized for video processing. The BeagleBoard, a commercial prototyping board based on the OMAP processor, has been used to test the Android Operating System and measure its performance. The board has 128 MB of SDRAM external memory, 256 MB of Flash external memory and several interfaces. Note that the clock frequency of the ARM and DSP OMAP cores is 600 MHz and 430 MHz, respectively. This paper describes the energy consumption estimation of the processes and multimedia applications of an Android v1.6 (Donut) OS on the OMAP3530-Based BeagleBoard. In addition, tools to communicate the two processing cores have been employed. A test-bench to profile the OS resource usage has been developed. As far as the energy estimates concern, the OMAP processor energy consumption model provided by the manufacturer has been used. The model is basically divided in two energy components. The former, the baseline core energy, describes the energy consumption that is independent of any chip activity. The latter, the module active energy, describes the energy consumed by the active modules depending on resource usage.
A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

DTIC Science & Technology

2012-02-17

to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Factors Influencing Rural Women Cassava Processors' Intention to Participate in an Agricultural Extension Education Program. Summary of Research 80.

ERIC Educational Resources Information Center

Ojomo, Christian O.; McCaslin, N. L.

A study examined factors influencing female cassava processors' intentions regarding participation in an extension education program on cassava processing in rural Nigeria. Interviews were conducted with 224 women who were purposely selected from areas of zone 3 of Ondo State, Nigeria, which has large concentrations of cassava processors.…
A general multiscroll Lorenz system family and its realization via digital signal processors.

PubMed

Yu, Simin; Lü, Jinhu; Tang, Wallace K S; Chen, Guanrong

2006-09-01

This paper proposes a general multiscroll Lorenz system family by introducing a novel parameterized nth-order polynomial transformation. Some basic dynamical behaviors of this general multiscroll Lorenz system family are then investigated, including bifurcations, maximum Lyapunov exponents, and parameters regions. Furthermore, the general multiscroll Lorenz attractors are physically verified by using digital signal processors.
50 CFR 660.160 - Catcher/processor (C/P) Coop Program.

Code of Federal Regulations, 2011 CFR

2011-10-01

... information; the descriptive items listed in this paragraph appear to meet the stated purpose; and information... deployment by the completion of the electronic vessel and/or processor survey(s); and (E) Immediately report...
General-purpose interface bus for multiuser, multitasking computer system

NASA Technical Reports Server (NTRS)

Generazio, Edward R.; Roth, Don J.; Stang, David B.

1990-01-01

The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry

1989-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry

1990-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.

PubMed

Glaser, I

1980-10-01

We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
ARTS III/Parallel Processor Design Study

DOT National Transportation Integrated Search

1975-04-01

It was the purpose of this design study to investigate the feasibility, suitability, and cost-effectiveness of augmenting the ARTS III failsafe/failsoft multiprocessor system with a form of parallel processor to accomodate a large growth in air traff...
Spatial attention in the mental architecture: evidence from neuropsychology.

PubMed

Behrmann, M; Black, S E; Murji, S

1995-04-01

Using neuropsychological evidence, this paper examines whether spatial attention functions as a domain-specific module or as a more general-purpose central processor. Data are presented from two spatial attention cuing tasks completed by subjects, with an acquired attentional deficit, and control subjects. In both tasks, an arrow indicated with high probability the side of response (response task) or the side of space on which the stimulus would appear (visuospatial task). In the response task, the stimuli appeared foveally and the response component was lateralized, and in the visuospatial task, the stimuli were lateralized and the response component remained constant in the midline. Only the neglect subjects showed a disproportionate increase in reaction time on both the response and visuospatial tasks when the arrow cued the subject to the ipsilateral side and the stimulus or response was on the side of space contralateral to the lesion. The substantial association across the two tasks suggests that a common underlying internal spatial representation subserves perception and action. While this finding is consistent with Fodor's view of a cross-domain processor, it does not meet all of his criteria of a central processor. We conclude, therefore, that the posterior attentional mechanism is strictly neither a module nor a central processor. Rather, these results suggest that a common attentional mechanism may subserve behavior in domains that are tightly coupled.
Automatic rendezvous and docking systems functional and performance requirements

NASA Technical Reports Server (NTRS)

1985-01-01

A generalized mission design scheme which utilizes a standard mission profile for all OMV rendezvous operations, recognizes typical operational constraints, and minimizes propellant penalties due to nodal regression effects was developed. This scheme has been used to demonstrate a unified guidance and navigation maneuver processor (the UMP), which supports all mission phases through station-keeping. The initial demonstration version of the Orbital Rendezvous Mission Planner (ORMP) was provided for evaluation purposes, and program operation was discussed.

Design of Low-Cost Impact Reporting System

DTIC Science & Technology

2015-12-01

Single Board Computers (SBC) available. Arduino and Raspberry Pi are very low cost and have huge communities for hardware design. Most of the SBC... Raspberry Pi Model B has a considerably faster processor than the Arduino. Although it provides only approximately 25 General Purpose Input and Output...reporting system must be able to operate on its own power for more than 2 or 3 hours. The Raspberry Pi Model B operates on 5 volts direct current at
Design and Implementation of a CMOS Chip for a Prolog

DTIC Science & Technology

1988-03-01

generation scheme . We use the P -circuit [9] with pre-conditioning and post- conditioning 12,3] circuits to generate the carry. The implementation of...system generates vertical microcode for a general purpose processor, the NCR 9300 sys- S tem, from W- code [7]. Three significant pieces of software are...calculation block generating the pro- pagate ( P ) and generate (G) signals needed for carry calculation, and a sum block supplying the final result. The top
DSS 13 Microprocessor Antenna Controller

NASA Technical Reports Server (NTRS)

Gosline, R. M.

1984-01-01

A microprocessor based antenna controller system developed as part of the unattended station project for DSS 13 is described. Both the hardware and software top level designs are presented and the major problems encounted are discussed. Developments useful to related projects include a JPL standard 15 line interface using a single board computer, a general purpose parser, a fast floating point to ASCII conversion technique, and experience gained in using off board floating point processors with the 8080 CPU.
Digital Waveguide Architectures for Virtual Musical Instruments

NASA Astrophysics Data System (ADS)

Smith, Julius O.

Digital sound synthesis has become a standard staple of modern music studios, videogames, personal computers, and hand-held devices. As processing power has increased over the years, sound synthesis implementations have evolved from dedicated chip sets, to single-chip solutions, and ultimately to software implementations within processors used primarily for other tasks (such as for graphics or general purpose computing). With the cost of implementation dropping closer and closer to zero, there is increasing room for higher quality algorithms.
Proton Testing of nVidia GTX 1050 GPU

NASA Technical Reports Server (NTRS)

Wyrwas, E. J.

2017-01-01

Single-Event Effects (SEE) testing was conducted on the nVidia GTX 1050 Graphics Processor Unit (GPU); herein referred to as device under test (DUT). Testing was conducted at Massachusetts General Hospitals (MGH) Francis H. Burr Proton Therapy Center on April 9th, 2017 using 200-MeV protons. This testing trip was purposed to provide a baseline assessment of the radiation susceptibility of the DUT as no previous testing had been conducted on this component.
Radiation-Hardened Electronics for Advanced Communications Systems

NASA Technical Reports Server (NTRS)

Whitaker, Sterling

2015-01-01

Novel approach enables high-speed special-purpose processors Advanced reconfigurable and reprogrammable communication systems will require sub-130-nanometer electronics. Legacy single event upset (SEU) radiation-tolerant circuits are ineffective at speeds greater than 125 megahertz. In Phase I of this project, ICs, LLC, demonstrated new base-level logic circuits that provide SEU immunity for sub-130-nanometer high-speed circuits. In Phase II, the company developed an innovative self-restoring logic (SRL) circuit and a system approach that provides high-speed, SEU-tolerant solutions that are effective for sub-130-nanometer electronics scalable to at least 22-nanometer processes. The SRL system can be used in the design of NASA's next-generation special-purpose processors, especially reconfigurable communication processors.
7 CFR 1435.500 - General statement.

Code of Federal Regulations, 2013 CFR

2013-01-01

... OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Processor Sugar Payment-In-Kind (PIK) Program § 1435.500 General statement. This subpart shall be applicable to sugar beet and... sugarcane or sugar beets processed by the processors, reduce sugar production in return for a payment of...
7 CFR 1435.500 - General statement.

Code of Federal Regulations, 2010 CFR

2010-01-01

... OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Processor Sugar Payment-In-Kind (PIK) Program § 1435.500 General statement. This subpart shall be applicable to sugar beet and... sugarcane or sugar beets processed by the processors, reduce sugar production in return for a payment of...
DSP Implementation of the Retinex Image Enhancement Algorithm

NASA Technical Reports Server (NTRS)

Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn

2004-01-01

The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame-grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
On-board processing concepts for future satellite communications systems

NASA Technical Reports Server (NTRS)

Brandon, W. T. (Editor); White, B. E. (Editor)

1980-01-01

The initial definition of on-board processing for an advanced satellite communications system to service domestic markets in the 1990's is discussed. An exemplar system with both RF on-board switching and demodulation/remodulation baseband processing is used to identify important issues related to system implementation, cost, and technology development. Analyses of spectrum-efficient modulation, coding, and system control techniques are summarized. Implementations for an RF switch and baseband processor are described. Among the major conclusions listed is the need for high gain satellites capable of handling tens of simultaneous beams for the efficient reuse of the 2.5 GHz 30/20 frequency band. Several scanning beams are recommended in addition to the fixed beams. Low power solid state 20 GHz GaAs FET power amplifiers in the 5W range and a general purpose digital baseband processor with gigahertz logic speeds and megabits of memory are also recommended.
Hardware Architecture Study for NASA's Space Software Defined Radios

NASA Technical Reports Server (NTRS)

Reinhart, Richard C.; Scardelletti, Maximilian C.; Mortensen, Dale J.; Kacpura, Thomas J.; Andro, Monty; Smith, Carl; Liebetreu, John

2008-01-01

This study defines a hardware architecture approach for software defined radios to enable commonality among NASA space missions. The architecture accommodates a range of reconfigurable processing technologies including general purpose processors, digital signal processors, field programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs) in addition to flexible and tunable radio frequency (RF) front-ends to satisfy varying mission requirements. The hardware architecture consists of modules, radio functions, and and interfaces. The modules are a logical division of common radio functions that comprise a typical communication radio. This paper describes the architecture details, module definitions, and the typical functions on each module as well as the module interfaces. Trade-offs between component-based, custom architecture and a functional-based, open architecture are described. The architecture does not specify the internal physical implementation within each module, nor does the architecture mandate the standards or ratings of the hardware used to construct the radios.
Space Telecommunications Radio Systems (STRS) Hardware Architecture Standard: Release 1.0 Hardware Section

NASA Technical Reports Server (NTRS)

Reinhart, Richard C.; Kacpura, Thomas J.; Smith, Carl R.; Liebetreu, John; Hill, Gary; Mortensen, Dale J.; Andro, Monty; Scardelletti, Maximilian C.; Farrington, Allen

2008-01-01

This report defines a hardware architecture approach for software-defined radios to enable commonality among NASA space missions. The architecture accommodates a range of reconfigurable processing technologies including general-purpose processors, digital signal processors, field programmable gate arrays, and application-specific integrated circuits (ASICs) in addition to flexible and tunable radiofrequency front ends to satisfy varying mission requirements. The hardware architecture consists of modules, radio functions, and interfaces. The modules are a logical division of common radio functions that compose a typical communication radio. This report describes the architecture details, the module definitions, the typical functions on each module, and the module interfaces. Tradeoffs between component-based, custom architecture and a functional-based, open architecture are described. The architecture does not specify a physical implementation internally on each module, nor does the architecture mandate the standards or ratings of the hardware used to construct the radios.
Implementation of MPEG-2 encoder to multiprocessor system using multiple MVPs (TMS320C80)

NASA Astrophysics Data System (ADS)

Kim, HyungSun; Boo, Kenny; Chung, SeokWoo; Choi, Geon Y.; Lee, YongJin; Jeon, JaeHo; Park, Hyun Wook

1997-05-01

This paper presents the efficient algorithm mapping for the real-time MPEG-2 encoding on the KAIST image computing system (KICS), which has a parallel architecture using five multimedia video processors (MVPs). The MVP is a general purpose digital signal processor (DSP) of Texas Instrument. It combines one floating-point processor and four fixed- point DSPs on a single chip. The KICS uses the MVP as a primary processing element (PE). Two PEs form a cluster, and there are two processing clusters in the KICS. Real-time MPEG-2 encoder is implemented through the spatial and the functional partitioning strategies. Encoding process of spatially partitioned half of the video input frame is assigned to ne processing cluster. Two PEs perform the functionally partitioned MPEG-2 encoding tasks in the pipelined operation mode. One PE of a cluster carries out the transform coding part and the other performs the predictive coding part of the MPEG-2 encoding algorithm. One MVP among five MVPs is used for system control and interface with host computer. This paper introduces an implementation of the MPEG-2 algorithm with a parallel processing architecture.
Advanced Multiple Processor Configuration Study. Final Report.

ERIC Educational Resources Information Center

Clymer, S. J.

This summary of a study on multiple processor configurations includes the objectives, background, approach, and results of research undertaken to provide the Air Force with a generalized model of computer processor combinations for use in the evaluation of proposed flight training simulator computational designs. An analysis of a real-time flight…
50 CFR 679.30 - General CDQ regulations.

Code of Federal Regulations, 2010 CFR

2010-10-01

... description of the target fisheries, the types of vessels and processors that will be used, the locations and... vessels or processors fishing under contract with any CDQ group. Any vessel or processor harvesting or... nature of the work and the career advancement potential for each type of work. (iv) Community eligibility...
Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

DOEpatents

Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

2009-07-21

In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.
Parallel Computing:. Some Activities in High Energy Physics

NASA Astrophysics Data System (ADS)

Willers, Ian

This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.
Recursive computer architecture for VLSI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Treleaven, P.C.; Hopkins, R.P.

1982-01-01

A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
Frequency Dependence of Single-event Upset in Advanced Commerical PowerPC Microprocessors

NASA Technical Reports Server (NTRS)

Irom, Frokh; Farmanesh, Farhad F.; Swift, Gary M.; Johnston, Allen H.

2004-01-01

This paper examines single-event upsets in advanced commercial SOI microprocessors in a dynamic mode, studying SEU sensitivity of General Purpose Registers (GPRs) with clock frequency. Results are presented for SOI processors with feature sizes of 0.18 microns and two different core voltages. Single-event upset from heavy ions is measured for advanced commercial microprocessors in a dynamic mode with clock frequency up to 1GHz. Frequency and core voltage dependence of single-event upsets in registers is discussed.
EOS image data processing system definition study

NASA Technical Reports Server (NTRS)

Gilbert, J.; Honikman, T.; Mcmahon, E.; Miller, E.; Pietrzak, L.; Yorsz, W.

1973-01-01

The Image Processing System (IPS) requirements and configuration are defined for NASA-sponsored advanced technology Earth Observatory System (EOS). The scope included investigation and definition of IPS operational, functional, and product requirements considering overall system constraints and interfaces (sensor, etc.) The scope also included investigation of the technical feasibility and definition of a point design reflecting system requirements. The design phase required a survey of present and projected technology related to general and special-purpose processors, high-density digital tape recorders, and image recorders.

Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

NASA Technical Reports Server (NTRS)

Smith, T. B., Jr.; Lala, J. H.

1983-01-01

The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.
Optimisation of a parallel ocean general circulation model

NASA Astrophysics Data System (ADS)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
40 CFR 432.91 - Special definitions.

Code of Federal Regulations, 2010 CFR

2010-07-01

... STANDARDS MEAT AND POULTRY PRODUCTS POINT SOURCE CATEGORY Canned Meats Processors § 432.91 Special definitions. For the purpose of this subpart: (a) Canned meats processor means an operation which prepares and cans meats (stew, sandwich spreads, or similar products), alone or in combination with other finished...
40 CFR 432.91 - Special definitions.

Code of Federal Regulations, 2011 CFR

2011-07-01

... STANDARDS MEAT AND POULTRY PRODUCTS POINT SOURCE CATEGORY Canned Meats Processors § 432.91 Special definitions. For the purpose of this subpart: (a) Canned meats processor means an operation which prepares and cans meats (stew, sandwich spreads, or similar products), alone or in combination with other finished...
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture

PubMed Central

Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo

2010-01-01

The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283
Fast Fourier Transform Co-Processor (FFTC)- Towards Embedded GFLOPs

NASA Astrophysics Data System (ADS)

Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Wite, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland

2012-08-01

Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co- Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment.In frame of the ESA activity “Fast Fourier Transform DSP Co-processor (FFTC)” (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following:Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP.The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance.The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT- based processing tasks.A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses.The presentation will give and overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Fast Fourier Transform Co-processor (FFTC), towards embedded GFLOPs

NASA Astrophysics Data System (ADS)

Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Witte, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland; Kopp, Nicholas

2012-10-01

Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co-Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment. In frame of the ESA activity "Fast Fourier Transform DSP Co-processor (FFTC)" (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following: • Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP. • The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance. The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT-based processing tasks. A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses. The paper will give an overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Software Acquisition Manager’s Workstation (SAM/WS) System Design.

DTIC Science & Technology

1984-04-30

3. Tactical Digital System Requirements ..................... 31General...pspc t14 3. Tactical Digital System Requirements pspc-tiS 3.1 General pspc-t16 3.2 Program Description pspc-t17 3.2.1 General...pspc-t22 3.3.2 Digital Processor Input/Output Utilization Table pspc t23 3.3.3 Digital Processor Interface Block Diagram pspc-t24 3.3.4 Program
New Modular Ultrasonic Signal Processing Building Blocks for Real-Time Data Acquisition and Post Processing

NASA Astrophysics Data System (ADS)

Weber, Walter H.; Mair, H. Douglas; Jansen, Dion

2003-03-01

A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
The GF-3 SAR Data Processor

PubMed Central

Han, Bing; Ding, Chibiao; Zhong, Lihua; Liu, Jiayin; Qiu, Xiaolan; Hu, Yuxin; Lei, Bin

2018-01-01

The Gaofen-3 (GF-3) data processor was developed as a workstation-based GF-3 synthetic aperture radar (SAR) data processing system. The processor consists of two vital subsystems of the GF-3 ground segment, which are referred to as data ingesting subsystem (DIS) and product generation subsystem (PGS). The primary purpose of DIS is to record and catalogue GF-3 raw data with a transferring format, and PGS is to produce slant range or geocoded imagery from the signal data. This paper presents a brief introduction of the GF-3 data processor, including descriptions of the system architecture, the processing algorithms and its output format. PMID:29534464
Ethernet-Enabled Power and Communication Module for Embedded Processors

NASA Technical Reports Server (NTRS)

Perotti, Jose; Oostdyk, Rebecca

2010-01-01

The power and communications module is a printed circuit board (PCB) that has the capability of providing power to an embedded processor and converting Ethernet packets into serial data to transfer to the processor. The purpose of the new design is to address the shortcomings of previous designs, including limited bandwidth and program memory, lack of control over packet processing, and lack of support for timing synchronization. The new design of the module creates a robust serial-to-Ethernet conversion that is powered using the existing Ethernet cable. This innovation has a small form factor that allows it to power processors and transducers with minimal space requirements.
Dielectrophoresis-Based Sample Handling in General-Purpose Programmable Diagnostic Instruments

PubMed Central

Gascoyne, Peter R. C.; Vykoukal, Jody V.

2009-01-01

As the molecular origins of disease are better understood, the need for affordable, rapid, and automated technologies that enable microscale molecular diagnostics has become apparent. Widespread use of microsystems that perform sample preparation and molecular analysis could ensure that the benefits of new biomedical discoveries are realized by a maximum number of people, even those in environments lacking any infrastructure. While progress has been made in developing miniaturized diagnostic systems, samples are generally processed off-device using labor-intensive and time-consuming traditional sample preparation methods. We present the concept of an integrated programmable general-purpose sample analysis processor (GSAP) architecture where raw samples are routed to separation and analysis functional blocks contained within a single device. Several dielectrophoresis-based methods that could serve as the foundation for building GSAP functional blocks are reviewed including methods for cell and particle sorting, cell focusing, cell ac impedance analysis, cell lysis, and the manipulation of molecules and reagent droplets. PMID:19684877
40 CFR 747.115 - Mixed mono and diamides of an organic acid.

Code of Federal Regulations, 2010 CFR

2010-07-01

... warning statement shall be no smaller than six point type. All required label text shall be of sufficient..., commerce, importer, impurity, Inventory, manufacturer, person, process, processor, and small quantities... control of the processor. (ii) Distribution in commerce is limited to purposes of export. (iii) The...
Supercomputing with TOUGH2 family codes for coupled multi-physics simulations of geologic carbon sequestration

NASA Astrophysics Data System (ADS)

Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.

2015-12-01

Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Design of the Protocol Processor for the ROBUS-2 Communication System

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

2005-01-01

The ROBUS-2 Protocol Processor (RPP) is a custom-designed hardware component implementing the functionality of the ROBUS-2 fault-tolerant communication system. The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault tolerant integrated modular architecture currently under development at NASA Langley Research Center. ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, time reference (clock synchronization), and distributed diagnosis (group membership). ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. ROBUS consists of RPPs connected to each other by a lower-level physical communication network. The RPP has a pipelined architecture and the design is parameterized in the behavioral and structural domains. The design of the RPP enables the bus to achieve a PE-message throughput that approaches the available bandwidth at the physical layer.
Effects of Talker Variability on Vowel Recognition in Cochlear Implants

ERIC Educational Resources Information Center

Chang, Yi-ping; Fu, Qian-Jie

2006-01-01

Purpose: To investigate the effects of talker variability on vowel recognition by cochlear implant (CI) users and by normal-hearing (NH) participants listening to 4-channel acoustic CI simulations. Method: CI users were tested with their clinically assigned speech processors. For NH participants, 3 CI processors were simulated, using different…
75 FR 10756 - Proposed Information Collection; Comment Request; Amendment 80 Economic Data Report for the...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-03-09

... Collection; Comment Request; Amendment 80 Economic Data Report for the Catcher/Processor Non-AFA Trawl Sector... catcher/processor sector. The Amendment 80 economic data report (EDR) collects cost, revenue, ownership... review the Program. The purpose of the EDR is to understand the economic effects of the Amendment 80...
78 FR 40103 - Proposed Information Collection; Comment Request; Amendment 80 Economic Data Report for the...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-03

... Collection; Comment Request; Amendment 80 Economic Data Report for the Catcher/Processor Non-AFA Trawl Sector.../processor sector. The Amendment 80 economic data report (EDR) collects cost, revenue, ownership, and... Program. The purpose of the EDR is to understand the economic effects of the Amendment 80 program on...
The Educational Effects of Word Processors. County of Lacombe No. 14.

ERIC Educational Resources Information Center

Spence, Gary

The main purpose of this 8-month study was to determine whether significant differences in student learning and attitudes occur as a result of the use of word processors, but curriculum changes, inservice teacher requirements, obstacles to incorporating word processing into language arts programs, effective teaching strategies, and effective…

Nuflood, Version 1.x

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tasseff, Byron

2016-07-29

NUFLOOD Version 1.x is a surface-water hydrodynamic package designed for the simulation of overland flow of fluids. It consists of various routines to address a wide range of applications (e.g., rainfall-runoff, tsunami, storm surge) and real time, interactive visualization tools. NUFLOOD has been designed for general-purpose computers and workstations containing multi-core processors and/or graphics processing units. The software is easy to use and extensible, constructed in mind for instructors, students, and practicing engineers. NUFLOOD is intended to assist the water resource community in planning against water-related natural disasters.
Modular System Control Development Model (MSCDM). Design Specification.

DTIC Science & Technology

1979-08-01

with power supply and ¶ can be used independently of the loop. The PDU can be used as a general purpose processor. The loop is contained in a separate...inputs to nodes 22 (VSQC), 23 (DSQC ) , and 26 (BWBSA) will be generated by a LSI—ll microprocessor used as a simulated input generator (SIG). The SIG...who c o b m n u n i — cate tau lt - s to the FIAC module. F~IAC generates even t reports to the OCRI and DBMS. The PDP1I/40 in loop 2 generates
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, F.; Morel, M.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

PubMed

Morrison, Frances P; Li, Li; Lai, Albert M; Hripcsak, George

2009-01-01

Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

NASA Astrophysics Data System (ADS)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
DOE Office of Scientific and Technical Information (OSTI.GOV)

Burge, S.W.

This report describes the FORCE2 flow program input, output, and the graphical post-processor. The manual describes the steps for creating the model, executing the programs and processing the results into graphical form. The FORCE2 post-processor was developed as an interactive program written in FORTRAN-77. It uses the Graphical Kernel System (GKS) graphics standard recently adopted by International Organization for Standardization, ISO, and American National Standards Institute, ANSI, and, therefore, can be used with many terminals. The post-processor vas written with Calcomp subroutine calls and is compatible with Tektkonix terminals and Calcomp and Nicolet pen plotters. B&W has been developing themore » FORCE2 code as a general-purpose tool for flow analysis of B&W equipment. The version of FORCE2 described in this manual was developed under the sponsorship of ASEA-Babcock as part of their participation in the joint R&D venture, ``Erosion of FBC Heat Transfer Tubes,`` and is applicable to the analyses of bubbling fluid beds. This manual is the principal documentation for program usage and is segmented into several sections to facilitate usage. In Section 2.0 the program is described, including assumptions, capabilities, limitations and uses, program status and location, related programs and program hardware and software requirements. Section 3.0 is a quick user`s reference guide for preparing input, executing FORCE2, and using the post-processor. Section 4.0 is a detailed description of the FORCE2 input. In Section 5.0, FORCE2 output is summarized. Section 6.0 contains a sample application, and Section 7.0 is a detailed reference guide.« less
Limit characteristics of digital optoelectronic processor

NASA Astrophysics Data System (ADS)

Kolobrodov, V. G.; Tymchik, G. S.; Kolobrodov, M. S.

2018-01-01

In this article, the limiting characteristics of a digital optoelectronic processor are explored. The limits are defined by diffraction effects and a matrix structure of the devices for input and output of optical signals. The purpose of a present research is to optimize the parameters of the processor's components. The developed physical and mathematical model of DOEP allowed to establish the limit characteristics of the processor, restricted by diffraction effects and an array structure of the equipment for input and output of optical signals, as well as to optimize the parameters of the processor's components. The diameter of the entrance pupil of the Fourier lens is determined by the size of SLM and the pixel size of the modulator. To determine the spectral resolution, it is offered to use a concept of an optimum phase when the resolved diffraction maxima coincide with the pixel centers of the radiation detector.
Conjugate-Gradient Algorithms For Dynamics Of Manipulators

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1993-01-01

Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
A Software Implementation of a Satellite Interface Message Processor.

ERIC Educational Resources Information Center

Eastwood, Margaret A.; Eastwood, Lester F., Jr.

A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…
Onboard processor technology review

NASA Technical Reports Server (NTRS)

Benz, Harry F.

1990-01-01

The general need and requirements for the onboard embedded processors necessary to control and manipulate data in spacecraft systems are discussed. The current known requirements are reviewed from a user perspective, based on current practices in the spacecraft development process. The current capabilities of available processor technologies are then discussed, and these are projected to the generation of spacecraft computers currently under identified, funded development. An appraisal is provided for the current national developmental effort.
A general natural-language text processor for clinical radiology.

PubMed Central

Friedman, C; Alderson, P O; Austin, J H; Cimino, J J; Johnson, S B

1994-01-01

OBJECTIVE: Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. DESIGN: The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. MEASUREMENTS: The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. RESULTS: Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision. PMID:7719797
Automatic film processors' quality control test in Greek military hospitals.

PubMed

Lymberis, C; Efstathopoulos, E P; Manetou, A; Poudridis, G

1993-04-01

The two major military radiology installations (Athens, Greece) using a total of 15 automatic film processors were assessed using the 21-step-wedge method. The results of quality control in all these processors are presented. The parameters measured under actual working conditions were base and fog, contrast and speed. Base and fog as well as speed displayed large variations with average values generally higher than acceptable, whilst contrast displayed greater stability. Developer temperature was measured daily during the test and was found to be outside the film manufacturers' recommended limits in nine of the 15 processors. In only one processor did film passing time vary on an every day basis and this was due to maloperation. Developer pH test was not part of the daily monitoring service being performed every 5 days for each film processor and found to be in the range 9-12; 10 of the 15 processors presented pH values outside the limits specified by the film manufacturers.
Reward-based learning under hardware constraints-using a RISC processor embedded in a neuromorphic substrate.

PubMed

Friedmann, Simon; Frémaux, Nicolas; Schemmel, Johannes; Gerstner, Wulfram; Meier, Karlheinz

2013-01-01

In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e., the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule. It is therefore an ideal candidate for implementation in an upgraded version of the wafer-scale system developed within the BrainScaleS project.
GPU-based Parallel Application Design for Emerging Mobile Devices

NASA Astrophysics Data System (ADS)

Gupta, Kshitij

A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
Reward-based learning under hardware constraints—using a RISC processor embedded in a neuromorphic substrate

PubMed Central

Friedmann, Simon; Frémaux, Nicolas; Schemmel, Johannes; Gerstner, Wulfram; Meier, Karlheinz

2013-01-01

In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e., the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule. It is therefore an ideal candidate for implementation in an upgraded version of the wafer-scale system developed within the BrainScaleS project. PMID:24065877
On-board computational efficiency in real time UAV embedded terrain reconstruction

NASA Astrophysics Data System (ADS)

Partsinevelos, Panagiotis; Agadakos, Ioannis; Athanasiou, Vasilis; Papaefstathiou, Ioannis; Mertikas, Stylianos; Kyritsis, Sarantis; Tripolitsiotis, Achilles; Zervos, Panagiotis

2014-05-01

In the last few years, there is a surge of applications for object recognition, interpretation and mapping using unmanned aerial vehicles (UAV). Specifications in constructing those UAVs are highly diverse with contradictory characteristics including cost-efficiency, carrying weight, flight time, mapping precision, real time processing capabilities, etc. In this work, a hexacopter UAV is employed for near real time terrain mapping. The main challenge addressed is to retain a low cost flying platform with real time processing capabilities. The UAV weight limitation affecting the overall flight time, makes the selection of the on-board processing components particularly critical. On the other hand, surface reconstruction, as a computational demanding task, calls for a highly demanding processing unit on board. To merge these two contradicting aspects along with customized development, a System on a Chip (SoC) integrated circuit is proposed as a low-power, low-cost processor, which natively supports camera sensors and positioning and navigation systems. Modern SoCs, such as Omap3530 or Zynq, are classified as heterogeneous devices and provide a versatile platform, allowing access to both general purpose processors, such as the ARM11, as well as specialized processors, such as a digital signal processor and floating field-programmable gate array. A UAV equipped with the proposed embedded processors, allows on-board terrain reconstruction using stereo vision in near real time. Furthermore, according to the frame rate required, additional image processing may concurrently take place, such as image rectification andobject detection. Lastly, the onboard positioning and navigation (e.g., GNSS) chip may further improve the quality of the generated map. The resulting terrain maps are compared to ground truth geodetic measurements in order to access the accuracy limitations of the overall process. It is shown that with our proposed novel system,there is much potential in computational efficiency on board and in optimized time constraints.
Generalization and Parallelization of Messy Genetic Algorithms and Communication in Parallel Genetic Algorithms.

DTIC Science & Technology

1992-12-01

Dynamics and Free Energy Perturbation Methods." Reviews in Computational Chem- istry edited by Kenny B. Lipkowitz and Donald B. Boyd, chapter 8, 295-320...atomic motions during annealing, allows the search to probabilistically move in a locally non-optimal direction. The probability of doing so is...Network processors communicate via communication links. This type of communication is generally very slow relative to other processor activities
Characterization of Stationary Distributions of Reflected Diffusions

DTIC Science & Technology

2014-01-01

Reiman , M. I. (2003). Fluid and heavy traffic limits for a generalized processor sharing model. Ann. Appl. Probab., 13, 100-139. [37] Ramanan, K. and... Reiman , M. I. (2008). The heavy traffic limit of an unbalanced generalized processor sharing model. Ann. Appl. Probab., 18, 22-58. [38] Reed, J. and...Control and Computing. [39] Reiman , M. I. and Williams, R. J. (1988). A boundary property of semimartingale reflecting Brownian motions. Probab. Theor
Enhancing instruction scheduling with a block-structured ISA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melvin, S.; Patt, Y.

It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA): block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes bothmore » dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.« less
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

A special purpose silicon compiler for designing supercomputing VLSI systems

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.

1991-01-01

Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.
An implementation of a reference symbol approach to generic modulation in fading channels

NASA Technical Reports Server (NTRS)

Young, R. J.; Lodge, J. H.; Pacola, L. C.

1990-01-01

As mobile satellite communications systems evolve over the next decade, they will have to adapt to a changing tradeoff between bandwidth and power. This paper presents a flexible approach to digital modulation and coding that will accommodate both wideband and narrowband schemes. This architecture could be the basis for a family of modems, each satisfying a specific power and bandwidth constraint, yet all having a large number of common signal processing blocks. The implementation of this generic approach, with general purpose digital processors for transmission of 4.8 kilobits per sec. digitally encoded speech, is described.
Optimization of Particle-in-Cell Codes on RISC Processors

NASA Technical Reports Server (NTRS)

Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.

1996-01-01

General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors

PubMed Central

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.

PubMed

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
Input data requirements for special processors in the computation system containing the VENTURE neutronics code. [LMFBR

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

1979-07-01

User input data requirements are presented for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated.
Lenslet array processors.

PubMed

Glaser, I

1982-04-01

By combining a lenslet array with masks it is possible to obtain a noncoherent optical processor capable of computing in parallel generalized 2-D discrete linear transformations. We present here an analysis of such lenslet array processors (LAP). The effect of several errors, including optical aberrations, diffraction, vignetting, and geometrical and mask errors, are calculated, and guidelines to optical design of LAP are derived. Using these results, both ultimate and practical performances of LAP are compared with those of competing techniques.
The Engineer Topographic Laboratories /ETL/ hybrid optical/digital image processor

NASA Astrophysics Data System (ADS)

Benton, J. R.; Corbett, F.; Tuft, R.

1980-01-01

An optical-digital processor for generalized image enhancement and filtering is described. The optical subsystem is a two-PROM Fourier filter processor. Input imagery is isolated, scaled, and imaged onto the first PROM; this input plane acts like a liquid gate and serves as an incoherent-to-coherent converter. The image is transformed onto a second PROM which also serves as a filter medium; filters are written onto the second PROM with a laser scanner in real time. A solid state CCTV camera records the filtered image, which is then digitized and stored in a digital image processor. The operator can then manipulate the filtered image using the gray scale and color remapping capabilities of the video processor as well as the digital processing capabilities of the minicomputer.
A Linked List-Based Algorithm for Blob Detection on Embedded Vision-Based Sensors.

PubMed

Acevedo-Avila, Ricardo; Gonzalez-Mendoza, Miguel; Garcia-Garcia, Andres

2016-05-28

Blob detection is a common task in vision-based applications. Most existing algorithms are aimed at execution on general purpose computers; while very few can be adapted to the computing restrictions present in embedded platforms. This paper focuses on the design of an algorithm capable of real-time blob detection that minimizes system memory consumption. The proposed algorithm detects objects in one image scan; it is based on a linked-list data structure tree used to label blobs depending on their shape and node information. An example application showing the results of a blob detection co-processor has been built on a low-powered field programmable gate array hardware as a step towards developing a smart video surveillance system. The detection method is intended for general purpose application. As such, several test cases focused on character recognition are also examined. The results obtained present a fair trade-off between accuracy and memory requirements; and prove the validity of the proposed approach for real-time implementation on resource-constrained computing platforms.
XAPiir: A recursive digital filtering package

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harris, D.

1990-09-21

XAPiir is a basic recursive digital filtering package, containing both design and implementation subroutines. XAPiir was developed for the experimental array processor (XAP) software package, and is written in FORTRAN. However, it is intended to be incorporated into any general- or special-purpose signal analysis program. It replaces the older package RECFIL, offering several enhancements. RECFIL is used in several large analysis programs developed at LLNL, including the seismic analysis package SAC, several expert systems (NORSEA and NETSEA), and two general purpose signal analysis packages (SIG and VIEW). This report is divided into two sections: the first describes the use ofmore » the subroutine package, and the second, its internal organization. In the first section, the filter design problem is briefly reviewed, along with the definitions of the filter design parameters and their relationship to the subroutine input parameters. In the second section, the internal organization is documented to simplify maintenance and extensions to the package. 5 refs., 9 figs.« less
Use of GPUs in Trigger Systems

NASA Astrophysics Data System (ADS)

Lamanna, Gianluca

In recent years the interest for using graphics processor (GPU) in general purpose high performance computing is constantly rising. In this paper we discuss the possible use of GPUs to construct a fast and effective real time trigger system, both in software and hardware levels. In particular, we study the integration of such a system in the NA62 trigger. The first application of GPUs for rings pattern recognition in the RICH will be presented. The results obtained show that there are not showstoppers in trigger systems with relatively low latency. Thanks to the use of off-the-shelf technology, in continous development for purposes related to video game and image processing market, the architecture described would be easily exported to other experiments, to build a versatile and fully customizable online selection.
Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, Sameer

Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the rootmore » processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become "virtual root compute nodes" for the purpose of replicating the broadcast message to lower levels of a tree.« less
A unified approach to VLSI layout automation and algorithm mapping on processor arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.

1993-01-01

Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.
50 CFR 648.6 - Dealer/processor permits.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 50 Wildlife and Fisheries 12 2013-10-01 2013-10-01 false Dealer/processor permits. 648.6 Section 648.6 Wildlife and Fisheries FISHERY CONSERVATION AND MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE FISHERIES OF THE NORTHEASTERN UNITED STATES General Provisions § 648.6...
Hot Chips and Hot Interconnects for High End Computing Systems

NASA Technical Reports Server (NTRS)

Saini, Subhash

2005-01-01

I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

NASA Astrophysics Data System (ADS)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Generic HTML Form Processor: A versatile PHP script to save web-collected data into a MySQL database.

PubMed

Göritz, Anja S; Birnbaum, Michael H

2005-11-01

The customizable PHP script Generic HTML Form Processor is intended to assist researchers and students in quickly setting up surveys and experiments that can be administered via the Web. This script relieves researchers from the burdens of writing new CGI scripts and building databases for each Web study. Generic HTML Form Processor processes any syntactically correct HTML forminput and saves it into a dynamically created open-source database. We describe five modes for usage of the script that allow increasing functionality but require increasing levels of knowledge of PHP and Web servers: The first two modes require no previous knowledge, and the fifth requires PHP programming expertise. Use of Generic HTML Form Processor is free for academic purposes, and its Web address is www.goeritz.net/brmic.
Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor

NASA Technical Reports Server (NTRS)

Moore, J. Strother

1992-01-01

Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions

DTIC Science & Technology

2014-05-01

processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
Noise limitations in optical linear algebra processors.

PubMed

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

49 CFR 236.901 - Purpose and scope.

Code of Federal Regulations, 2010 CFR

2010-10-01

... control systems, subsystems, and components that are safety-critical products, as defined in § 236.903..., MAINTENANCE, AND REPAIR OF SIGNAL AND TRAIN CONTROL SYSTEMS, DEVICES, AND APPLIANCES Standards for Processor-Based Signal and Train Control Systems § 236.901 Purpose and scope. (a) What is the purpose of this...
40 CFR 63.680 - Applicability and designation of affected sources.

Code of Federal Regulations, 2012 CFR

2012-07-01

... subpart F—Standards for Used Oil Processors and Refiners. (b) For the purpose of implementing this subpart... of this subpart. (2) For the purpose of implementing this subpart, the following materials are not... service by government agencies, businesses, or other organizations for the purpose of promoting the proper...
40 CFR 63.680 - Applicability and designation of affected sources.

Code of Federal Regulations, 2010 CFR

2010-07-01

... subpart F—Standards for Used Oil Processors and Refiners. (b) For the purpose of implementing this subpart... of this subpart. (2) For the purpose of implementing this subpart, the following materials are not... service by government agencies, businesses, or other organizations for the purpose of promoting the proper...
40 CFR 63.680 - Applicability and designation of affected sources.

Code of Federal Regulations, 2013 CFR

2013-07-01

... subpart F—Standards for Used Oil Processors and Refiners. (b) For the purpose of implementing this subpart... of this subpart. (2) For the purpose of implementing this subpart, the following materials are not... service by government agencies, businesses, or other organizations for the purpose of promoting the proper...
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Space Telecommunications Radio System Software Architecture Concepts and Analysis

NASA Technical Reports Server (NTRS)

Handler, Louis M.; Hall, Charles S.; Briones, Janette C.; Blaser, Tammy M.

2008-01-01

The Space Telecommunications Radio System (STRS) project investigated various Software Defined Radio (SDR) architectures for Space. An STRS architecture has been selected that separates the STRS operating environment from its various waveforms and also abstracts any specialized hardware to limit its effect on the operating environment. The design supports software evolution where new functionality is incorporated into the radio. Radio hardware functionality has been moving from hardware based ASICs into firmware and software based processors such as FPGAs, DSPs and General Purpose Processors (GPPs). Use cases capture the requirements of a system by describing how the system should interact with the users or other systems (the actors) to achieve a specific goal. The Unified Modeling Language (UML) is used to illustrate the Use Cases in a variety of ways. The Top Level Use Case diagram shows groupings of the use cases and how the actors are involved. The state diagrams depict the various states that a system or object may be in and the transitions between those states. The sequence diagrams show the main flow of activity as described in the use cases.
Autonomic Closure for Turbulent Flows Using Approximate Bayesian Computation

NASA Astrophysics Data System (ADS)

Doronina, Olga; Christopher, Jason; Hamlington, Peter; Dahm, Werner

2017-11-01

Autonomic closure is a new technique for achieving fully adaptive and physically accurate closure of coarse-grained turbulent flow governing equations, such as those solved in large eddy simulations (LES). Although autonomic closure has been shown in recent a priori tests to more accurately represent unclosed terms than do dynamic versions of traditional LES models, the computational cost of the approach makes it challenging to implement for simulations of practical turbulent flows at realistically high Reynolds numbers. The optimization step used in the approach introduces large matrices that must be inverted and is highly memory intensive. In order to reduce memory requirements, here we propose to use approximate Bayesian computation (ABC) in place of the optimization step, thereby yielding a computationally-efficient implementation of autonomic closure that trades memory-intensive for processor-intensive computations. The latter challenge can be overcome as co-processors such as general purpose graphical processing units become increasingly available on current generation petascale and exascale supercomputers. In this work, we outline the formulation of ABC-enabled autonomic closure and present initial results demonstrating the accuracy and computational cost of the approach.
A FPGA embedded web server for remote monitoring and control of smart sensors networks.

PubMed

Magdaleno, Eduardo; Rodríguez, Manuel; Pérez, Fernando; Hernández, David; García, Enrique

2013-12-27

This article describes the implementation of a web server using an embedded Altera NIOS II IP core, a general purpose and configurable RISC processor which is embedded in a Cyclone FPGA. The processor uses the μCLinux operating system to support a Boa web server of dynamic pages using Common Gateway Interface (CGI). The FPGA is configured to act like the master node of a network, and also to control and monitor a network of smart sensors or instruments. In order to develop a totally functional system, the FPGA also includes an implementation of the time-triggered protocol (TTP/A). Thus, the implemented master node has two interfaces, the webserver that acts as an Internet interface and the other to control the network. This protocol is widely used to connecting smart sensors and actuators and microsystems in embedded real-time systems in different application domains, e.g., industrial, automotive, domotic, etc., although this protocol can be easily replaced by any other because of the inherent characteristics of the FPGA-based technology.
A FPGA Embedded Web Server for Remote Monitoring and Control of Smart Sensors Networks

PubMed Central

Magdaleno, Eduardo; Rodríguez, Manuel; Pérez, Fernando; Hernández, David; García, Enrique

2014-01-01

This article describes the implementation of a web server using an embedded Altera NIOS II IP core, a general purpose and configurable RISC processor which is embedded in a Cyclone FPGA. The processor uses the μCLinux operating system to support a Boa web server of dynamic pages using Common Gateway Interface (CGI). The FPGA is configured to act like the master node of a network, and also to control and monitor a network of smart sensors or instruments. In order to develop a totally functional system, the FPGA also includes an implementation of the time-triggered protocol (TTP/A). Thus, the implemented master node has two interfaces, the webserver that acts as an Internet interface and the other to control the network. This protocol is widely used to connecting smart sensors and actuators and microsystems in embedded real-time systems in different application domains, e.g., industrial, automotive, domotic, etc., although this protocol can be easily replaced by any other because of the inherent characteristics of the FPGA-based technology. PMID:24379047
Development of an embedded atmospheric turbulence mitigation engine

NASA Astrophysics Data System (ADS)

Paolini, Aaron; Bonnett, James; Kozacik, Stephen; Kelmelis, Eric

2017-05-01

Methods to reconstruct pictures from imagery degraded by atmospheric turbulence have been under development for decades. The techniques were initially developed for observing astronomical phenomena from the Earth's surface, but have more recently been modified for ground and air surveillance scenarios. Such applications can impose significant constraints on deployment options because they both increase the computational complexity of the algorithms themselves and often dictate a requirement for low size, weight, and power (SWaP) form factors. Consequently, embedded implementations must be developed that can perform the necessary computations on low-SWaP platforms. Fortunately, there is an emerging class of embedded processors driven by the mobile and ubiquitous computing industries. We have leveraged these processors to develop embedded versions of the core atmospheric correction engine found in our ATCOM software. In this paper, we will present our experience adapting our algorithms for embedded systems on a chip (SoCs), namely the NVIDIA Tegra that couples general-purpose ARM cores with their graphics processing unit (GPU) technology and the Xilinx Zynq which pairs similar ARM cores with their field-programmable gate array (FPGA) fabric.
Optimal partitioning of random programs across two processors

NASA Technical Reports Server (NTRS)

Nicol, D. M.

1986-01-01

The optimal partitioning of random distributed programs is discussed. It is concluded that the optimal partitioning of a homogeneous random program over a homogeneous distributed system either assigns all modules to a single processor, or distributes the modules as evenly as possible among all processors. The analysis rests heavily on the approximation which equates the expected maximum of a set of independent random variables with the set's maximum expectation. The results are strengthened by providing an approximation-free proof of this result for two processors under general conditions on the module execution time distribution. It is also shown that use of this approximation causes two of the previous central results to be false.
Optical linear algebra processors: noise and error-source modeling.

PubMed

Casasent, D; Ghosh, A

1985-06-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Optical linear algebra processors - Noise and error-source modeling

NASA Technical Reports Server (NTRS)

Casasent, D.; Ghosh, A.

1985-01-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Evaluation of pH monitoring as a method of processor control.

PubMed

Stears, J G; Gray, J E; Winkler, N T

1979-01-01

Sensitometry and pH values of the developer solution were compared in controlled over-replenishment, developer depletion, fixer contamination experiments, and on a daily quality control basis. The purpose of these comparisons was to evaluate the potential of pH monitoring as a method of processor control, or a supplement to sensitometry as a method of quality control. Reasonable correlation was found between pH values and film density in two of the three experiments but little or no correlation was found in the third experiment and on a day-to-day basis. The conclusion drawn from these comparisons is that pH monitoring has several limitations which render it unsuitable as a method of daily processor quality control as either a primary or supplementary technique. Sensitometry takes into account all the variables encountered in film processing and is the clear method of choice for processor quality control.
Control structures for high speed processors

NASA Technical Reports Server (NTRS)

Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

1982-01-01

A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.
An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors

NASA Technical Reports Server (NTRS)

Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.

2015-01-01

This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.
Class network routing

DOEpatents

Bhanot, Gyan [Princeton, NJ; Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-09-08

Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
Real-Time, General-Purpose, High-Speed Signal Processing Systems for Underwater Research. Proceedings of a Working Level Conference held at Supreme Allied Commander, Atlantic, Anti-Submarine Warfare Research Center (SACLANTCEN) on 18-21 September 1979. Part 1. Sessions I to III.

DTIC Science & Technology

1979-12-01

intelligent graphics terminals in real-tim processing S (e) 5-1 to 5-9 MIel ita|ger The application of high-speed processors to propagation e.piriamnts...interface SACLANTCEN CP-25 5-2 M IM M STEIGER: Intelligent graphics terminals The less desirable features of the terminal are listed below. reiatively small...hours. Dismantling of the equipment is normally performed in less than one-half hour and often while waiting to clear customs. Transportation of the
Sequence invariant state machines

NASA Technical Reports Server (NTRS)

Whitaker, S.; Manjunath, S.

1990-01-01

A synthesis method and new VLSI architecture are introduced to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. A design method is proposed that utilizes BTS logic to implement regular and dense circuits. A given state sequence can be programmed with power supply connections or dynamically reallocated if stored in a register. Arbitrary flow table sequences can be modified or programmed to dynamically alter the function of the machine. This allows VLSI controllers to be designed with the programmability of a general purpose processor but with the compact size and performance of dedicated logic.
Programs for analysis and resizing of complex structures. [computerized minimum weight design

NASA Technical Reports Server (NTRS)

Haftka, R. T.; Prasad, B.

1978-01-01

The paper describes the PARS (Programs for Analysis and Resizing of Structures) system. PARS is a user oriented system of programs for the minimum weight design of structures modeled by finite elements and subject to stress, displacement, flutter and thermal constraints. The system is built around SPAR - an efficient and modular general purpose finite element program, and consists of a series of processors that communicate through the use of a data base. An efficient optimizer based on the Sequence of Unconstrained Minimization Technique (SUMT) with an extended interior penalty function and Newton's method is used. Several problems are presented for demonstration of the system capabilities.

Multiprocessor and memory architecture of the neurocomputer SYNAPSE-1.

PubMed

Ramacher, U; Raab, W; Anlauf, J; Hachmann, U; Beichter, J; Brüls, N; Wesseling, M; Sicheneder, E; Männer, R; Glass, J

1993-12-01

A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude--including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C(+2) has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.
A Full Mesh ATCA-based General Purpose Data Processing Board (Pulsar II)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ajuha, S.

The Pulsar II is a custom ATCA full mesh enabled FPGA-based processor board which has been designed with the goal of creating a scalable architecture abundant in flexible, non-blocking, high bandwidth interconnections. The design has been motivated by silicon-based tracking trigger needs for LHC experiments. In this technical memo we describe the Pulsar II hardware and its performance, such as the performance test results with full mesh backplanes from different vendors, how the backplane is used for the development of low-latency time-multiplexed data transfer schemes and how the inter-shelf and intra-shelf synchronization works.
Literal algebra for satellite dynamics. [perturbation analysis

NASA Technical Reports Server (NTRS)

Gaposchkin, E. M.

1975-01-01

A description of the rather general class of operations available is given and the operations are related to problems in satellite dynamics. The implementation of an algebra processor is discussed. The four main categories of symbol processors are related to list processing, string manipulation, symbol manipulation, and formula manipulation. Fundamental required operations for an algebra processor are considered. It is pointed out that algebra programs have been used for a number of problems in celestial mechanics with great success. The advantage of computer algebra is its accuracy and speed.
Sentient networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chapline, G.

1998-03-01

The engineering problems of constructing autonomous networks of sensors and data processors that can provide alerts for dangerous situations provide a new context for debating the question whether man-made systems can emulate the cognitive capabilities of the mammalian brain. In this paper we consider the question whether a distributed network of sensors and data processors can form ``perceptions`` based on sensory data. Because sensory data can have exponentially many explanations, the use of a central data processor to analyze the outputs from a large ensemble of sensors will in general introduce unacceptable latencies for responding to dangerous situations. A bettermore » idea is to use a distributed ``Helmholtz machine`` architecture in which the sensors are connected to a network of simple processors, and the collective state of the network as a whole provides an explanation for the sensory data. In general communication within such a network will require time division multiplexing, which opens the door to the possibility that with certain refinements to the Helmholtz machine architecture it may be possible to build sensor networks that exhibit a form of artificial consciousness.« less
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.

1990-01-01

A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
GRAPE-4: A special-purpose computer for gravitational N-body problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Makino, Junichiro; Taiji, Makoto; Ebisuzaki, Toshikazu

1995-12-01

We describe GRAPE-4, a special-purpose computer for gravitational N-body simulations. In gravitational N-body simulations, almost all computing time is spent for the calculation of interaction between particles. GRAPE-4 is a specialized hardware to calculate the interaction between particles. It is used with a general-purpose host computer that performs all calculations other than the force calculation. With this architecture, it is relatively easy to realize a massively parallel system. In 1991, we developed the GRAPE-3 system with the peak speed equivalent to 14.4 Gflops. It consists of 48 custom pipelined processors. In 1992 we started the development of GRAPE-4. The GRAPE-4more » system will consist of 1920 custom pipeline chips. Each chip has the speed of 600 Mflops, when operated on 30 MHz clock. A prototype system with two custom LSIs has been completed July 1994, and the full system is now under manufacturing.« less
40 CFR 725.1 - Scope and purpose.

Code of Federal Regulations, 2010 CFR

2010-07-01

... research and development for commercial purposes. New microorganisms for which manufacturers and importers... any microorganism that EPA determines by rule is being manufactured, imported, or processed for a significant new use. (b) Any manufacturer, importer, or processor required to report under section 5 of TSCA...
Kellogg Library and Archive Retrieval System (KLARS) Document Capture Manual. Draft Version.

ERIC Educational Resources Information Center

Hugo, Jane

This manual is designed to supply background information for Kellogg Library and Archive Retrieval System (KLARS) processors and others who might work with the system, outline detailed policies and procedures for processors who prepare and enter data into the adult education database on KLARS, and inform general readers about the system. KLARS is…
Dual-scale topology optoelectronic processor.

PubMed

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
Applications of artificial intelligence to space station: General purpose intelligent sensor interface

NASA Technical Reports Server (NTRS)

Mckee, James W.

1988-01-01

This final report describes the accomplishments of the General Purpose Intelligent Sensor Interface task of the Applications of Artificial Intelligence to Space Station grant for the period from October 1, 1987 through September 30, 1988. Portions of the First Biannual Report not revised will not be included but only referenced. The goal is to develop an intelligent sensor system that will simplify the design and development of expert systems using sensors of the physical phenomena as a source of data. This research will concentrate on the integration of image processing sensors and voice processing sensors with a computer designed for expert system development. The result of this research will be the design and documentation of a system in which the user will not need to be an expert in such areas as image processing algorithms, local area networks, image processor hardware selection or interfacing, television camera selection, voice recognition hardware selection, or analog signal processing. The user will be able to access data from video or voice sensors through standard LISP statements without any need to know about the sensor hardware or software.
A Linked List-Based Algorithm for Blob Detection on Embedded Vision-Based Sensors

PubMed Central

Acevedo-Avila, Ricardo; Gonzalez-Mendoza, Miguel; Garcia-Garcia, Andres

2016-01-01

Blob detection is a common task in vision-based applications. Most existing algorithms are aimed at execution on general purpose computers; while very few can be adapted to the computing restrictions present in embedded platforms. This paper focuses on the design of an algorithm capable of real-time blob detection that minimizes system memory consumption. The proposed algorithm detects objects in one image scan; it is based on a linked-list data structure tree used to label blobs depending on their shape and node information. An example application showing the results of a blob detection co-processor has been built on a low-powered field programmable gate array hardware as a step towards developing a smart video surveillance system. The detection method is intended for general purpose application. As such, several test cases focused on character recognition are also examined. The results obtained present a fair trade-off between accuracy and memory requirements; and prove the validity of the proposed approach for real-time implementation on resource-constrained computing platforms. PMID:27240382
Cobalt: A GPU-based correlator and beamformer for LOFAR

NASA Astrophysics Data System (ADS)

Broekema, P. Chris; Mol, J. Jan David; Nijboer, R.; van Amesfoort, A. S.; Brentjens, M. A.; Loose, G. Marcel; Klijn, W. F. A.; Romein, J. W.

2018-04-01

For low-frequency radio astronomy, software correlation and beamforming on general purpose hardware is a viable alternative to custom designed hardware. LOFAR, a new-generation radio telescope centered in the Netherlands with international stations in Germany, France, Ireland, Poland, Sweden and the UK, has successfully used software real-time processors based on IBM Blue Gene technology since 2004. Since then, developments in technology have allowed us to build a system based on commercial off-the-shelf components that combines the same capabilities with lower operational cost. In this paper, we describe the design and implementation of a GPU-based correlator and beamformer with the same capabilities as the Blue Gene based systems. We focus on the design approach taken, and show the challenges faced in selecting an appropriate system. The design, implementation and verification of the software system show the value of a modern test-driven development approach. Operational experience, based on three years of operations, demonstrates that a general purpose system is a good alternative to the previous supercomputer-based system or custom-designed hardware.
The precision-processing subsystem for the Earth Resources Technology Satellite.

NASA Technical Reports Server (NTRS)

Chapelle, W. E.; Bybee, J. E.; Bedross, G. M.

1972-01-01

Description of the precision processor, a subsystem in the image-processing system for the Earth Resources Technology Satellite (ERTS). This processor is a special-purpose image-measurement and printing system, designed to process user-selected bulk images to produce 1:1,000,000-scale film outputs and digital image data, presented in a Universal-Transverse-Mercator (UTM) projection. The system will remove geometric and radiometric errors introduced by the ERTS multispectral sensors and by the bulk-processor electron-beam recorder. The geometric transformations required for each input scene are determined by resection computations based on reseau measurements and image comparisons with a special ground-control base contained within the system; the images are then printed and digitized by electronic image-transfer techniques.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems.

PubMed

Downie, J D; Goodman, J W

1989-10-15

A ground-based adaptive optics imaging telescope system attempts to improve image quality by measuring and correcting for atmospherically induced wavefront aberrations. The necessary control computations during each cycle will take a finite amount of time, which adds to the residual error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper investigates this possibility by studying the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for adaptive optics use.
Configurable Multi-Purpose Processor

NASA Technical Reports Server (NTRS)

Valencia, J. Emilio; Forney, Chirstopher; Morrison, Robert; Birr, Richard

2010-01-01

Advancements in technology have allowed the miniaturization of systems used in aerospace vehicles. This technology is driven by the need for next-generation systems that provide reliable, responsive, and cost-effective range operations while providing increased capabilities such as simultaneous mission support, increased launch trajectories, improved launch, and landing opportunities, etc. Leveraging the newest technologies, the command and telemetry processor (CTP) concept provides for a compact, flexible, and integrated solution for flight command and telemetry systems and range systems. The CTP is a relatively small circuit board that serves as a processing platform for high dynamic, high vibration environments. The CTP can be reconfigured and reprogrammed, allowing it to be adapted for many different applications. The design is centered around a configurable field-programmable gate array (FPGA) device that contains numerous logic cells that can be used to implement traditional integrated circuits. The FPGA contains two PowerPC processors running the Vx-Works real-time operating system and are used to execute software programs specific to each application. The CTP was designed and developed specifically to provide telemetry functions; namely, the command processing, telemetry processing, and GPS metric tracking of a flight vehicle. However, it can be used as a general-purpose processor board to perform numerous functions implemented in either hardware or software using the FPGA s processors and/or logic cells. Functionally, the CTP was designed for range safety applications where it would ultimately become part of a vehicle s flight termination system. Consequently, the major functions of the CTP are to perform the forward link command processing, GPS metric tracking, return link telemetry data processing, error detection and correction, data encryption/ decryption, and initiate flight termination action commands. Also, the CTP had to be designed to survive and operate in a launch environment. Additionally, the CTP was designed to interface with the WFF (Wallops Flight Facility) custom-designed transceiver board which is used in the Low Cost TDRSS Transceiver (LCT2) also developed by WFF. The LCT2 s transceiver board demodulates commands received from the ground via the forward link and sends them to the CTP, where they are processed. The CTP inputs and processes data from the inertial measurement unit (IMU) and the GPS receiver board, generates status data, and then sends the data to the transceiver board where it is modulated and sent to the ground via the return link. Overall, the CTP has combined processing with the ability to interface to a GPS receiver, an IMU, and a pulse code modulation (PCM) communication link, while providing the capability to support common interfaces including Ethernet and serial interfaces boarding a relatively small-sized, lightweight package.
Development of a Dynamic Time Sharing Scheduled Environment Final Report CRADA No. TC-824-94E

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jette, M.; Caliga, D.

Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CFU implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processorsmore » are allocated simultaneously to aU components of a parallel program (in a “gang”). Program execution is preempted as needed to provide for interactivity. Programs are also reIocated to different processors as needed to efficiently pack the computer’s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for systemlevel software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing, efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures.« less
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
High-Level Data-Abstraction System

NASA Technical Reports Server (NTRS)

Fishwick, P. A.

1986-01-01

Communication with data-base processor flexible and efficient. High Level Data Abstraction (HILDA) system is three-layer system supporting data-abstraction features of Intel data-base processor (DBP). Purpose of HILDA establishment of flexible method of efficiently communicating with DBP. Power of HILDA lies in its extensibility with regard to syntax and semantic changes. HILDA's high-level query language readily modified. Offers powerful potential to computer sites where DBP attached to DEC VAX-series computer. HILDA system written in Pascal and FORTRAN 77 for interactive execution.

Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
The design of an adaptive predictive coder using a single-chip digital signal processor

NASA Astrophysics Data System (ADS)

Randolph, M. A.

1985-01-01

A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
Benchmarking GNU Radio Kernels and Multi-Processor Scheduling

DTIC Science & Technology

2013-01-14

AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T

2013-01-01

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
40 CFR 725.1 - Scope and purpose.

Code of Federal Regulations, 2011 CFR

2011-07-01

... research and development for commercial purposes. New microorganisms for which manufacturers and importers... significant new use. (b) Any manufacturer, importer, or processor required to report under section 5 of TSCA (see § 725.100 for new microorganisms and § 725.900 for significant new uses) must file a Microbial...
Robust media processing on programmable power-constrained systems

NASA Astrophysics Data System (ADS)

McVeigh, Jeff

2005-03-01

To achieve consumer-level quality, media systems must process continuous streams of audio and video data while maintaining exacting tolerances on sampling rate, jitter, synchronization, and latency. While it is relatively straightforward to design fixed-function hardware implementations to satisfy worst-case conditions, there is a growing trend to utilize programmable multi-tasking solutions for media applications. The flexibility of these systems enables support for multiple current and future media formats, which can reduce design costs and time-to-market. This paper provides practical engineering solutions to achieve robust media processing on such systems, with specific attention given to power-constrained platforms. The techniques covered in this article utilize the fundamental concepts of algorithm and software optimization, software/hardware partitioning, stream buffering, hierarchical prioritization, and system resource and power management. A novel enhancement to dynamically adjust processor voltage and frequency based on buffer fullness to reduce system power consumption is examined in detail. The application of these techniques is provided in a case study of a portable video player implementation based on a general-purpose processor running a non real-time operating system that achieves robust playback of synchronized H.264 video and MP3 audio from local storage and streaming over 802.11.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

NASA Astrophysics Data System (ADS)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Utility of coupling nonlinear optimization methods with numerical modeling software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murphy, M.J.

1996-08-05

Results of using GLO (Global Local Optimizer), a general purpose nonlinear optimization software package for investigating multi-parameter problems in science and engineering is discussed. The package consists of the modular optimization control system (GLO), a graphical user interface (GLO-GUI), a pre-processor (GLO-PUT), a post-processor (GLO-GET), and nonlinear optimization software modules, GLOBAL & LOCAL. GLO is designed for controlling and easy coupling to any scientific software application. GLO runs the optimization module and scientific software application in an iterative loop. At each iteration, the optimization module defines new values for the set of parameters being optimized. GLO-PUT inserts the new parametermore » values into the input file of the scientific application. GLO runs the application with the new parameter values. GLO-GET determines the value of the objective function by extracting the results of the analysis and comparing to the desired result. GLO continues to run the scientific application over and over until it finds the ``best`` set of parameters by minimizing (or maximizing) the objective function. An example problem showing the optimization of material model is presented (Taylor cylinder impact test).« less
Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

NASA Astrophysics Data System (ADS)

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

2012-06-01

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor

NASA Astrophysics Data System (ADS)

Moberly, Raymond; O'Sullivan, Michael; Waheed, Khurram

2007-09-01

Implementing the sum-product algorithm, in an FPGA with an embedded processor, invites us to consider a tradeoff between computational precision and computational speed. The algorithm, known outside of the signal processing community as Pearl's belief propagation, is used for iterative soft-decision decoding of LDPC codes. We determined the feasibility of a coprocessor that will perform product computations. Our FPGA-based coprocessor (design) performs computer algebra with significantly less precision than the standard (e.g. integer, floating-point) operations of general purpose processors. Using synthesis, targeting a 3,168 LUT Xilinx FPGA, we show that key components of a decoder are feasible and that the full single-precision decoder could be constructed using a larger part. Soft-decision decoding by the iterative belief propagation algorithm is impacted both positively and negatively by a reduction in the precision of the computation. Reducing precision reduces the coding gain, but the limited-precision computation can operate faster. A proposed solution offers custom logic to perform computations with less precision, yet uses the floating-point format to interface with the software. Simulation results show the achievable coding gain. Synthesis results help theorize the the full capacity and performance of an FPGA-based coprocessor.
Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

PubMed

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

2012-06-13

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Importance of balanced architectures in the design of high-performance imaging systems

NASA Astrophysics Data System (ADS)

Sgro, Joseph A.; Stanton, Paul C.

1999-03-01

Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

NASA Astrophysics Data System (ADS)

Grzeszczuk, A.; Kowalski, S.

2015-04-01

Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Level 1 Daq System for Kloe

NASA Astrophysics Data System (ADS)

Aloisio, A.; Cavaliere, S.; Cevenini, F.; Della Volpe, D.; Merola, L.; Anastasio, A.; Fiore, D. J.

KLOE is a general purpose detector optimized to observe CP violation in K0 decays. This detector will be installed at the DAΦNE Φ-factory, in Frascati (Italy) and it is expected to run at the end of 1997. The KLOE DAQ system can be divided mainly into the front-end fast readout section (the Level 1 DAQ), the FDDI Switch and the processor farm. The total bandwidth requirement is estimated to be of the order of 50 Mbyte/s. In this paper, we describe the Level 1 DAQ section, which is based on custom protocols and hardware controllers, developed to achieve high data transfer rates and event building capabilities without software overhead.
Category-theoretic models of algebraic computer systems

NASA Astrophysics Data System (ADS)

Kovalyov, S. P.

2016-01-01

A computer system is said to be algebraic if it contains nodes that implement unconventional computation paradigms based on universal algebra. A category-based approach to modeling such systems that provides a theoretical basis for mapping tasks to these systems' architecture is proposed. The construction of algebraic models of general-purpose computations involving conditional statements and overflow control is formally described by a reflector in an appropriate category of algebras. It is proved that this reflector takes the modulo ring whose operations are implemented in the conventional arithmetic processors to the Łukasiewicz logic matrix. Enrichments of the set of ring operations that form bases in the Łukasiewicz logic matrix are found.
A fast ultrasonic simulation tool based on massively parallel implementations

NASA Astrophysics Data System (ADS)

Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

2014-02-01

This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
An overview of software design languages. [for Galileo spacecraft Command and Data Subsystems

NASA Technical Reports Server (NTRS)

Callender, E. D.

1980-01-01

The nature and use of design languages and associated processors that are used in software development are reviewed with reference to development work on the Galileo spacecraft project, a Jupiter orbiter scheduled for launch in 1984. The major design steps are identified (functional design, architectural design, detailed design, coding, and testing), and the purpose, functions and the range of applications of design languages are examined. Then the general character of any design language is analyzed in terms of syntax and semantics. Finally, the differences and similarities between design languages are illustrated by examining two specific design languages: Software Design and Documentation language and Problem Statement Language/Problem Statement Analyzer.
A pipeline VLSI design of fast singular value decomposition processor for real-time EEG system based on on-line recursive independent component analysis.

PubMed

Huang, Kuan-Ju; Shih, Wei-Yeh; Chang, Jui Chung; Feng, Chih Wei; Fang, Wai-Chi

2013-01-01

This paper presents a pipeline VLSI design of fast singular value decomposition (SVD) processor for real-time electroencephalography (EEG) system based on on-line recursive independent component analysis (ORICA). Since SVD is used frequently in computations of the real-time EEG system, a low-latency and high-accuracy SVD processor is essential. During the EEG system process, the proposed SVD processor aims to solve the diagonal, inverse and inverse square root matrices of the target matrices in real time. Generally, SVD requires a huge amount of computation in hardware implementation. Therefore, this work proposes a novel design concept for data flow updating to assist the pipeline VLSI implementation. The SVD processor can greatly improve the feasibility of real-time EEG system applications such as brain computer interfaces (BCIs). The proposed architecture is implemented using TSMC 90 nm CMOS technology. The sample rate of EEG raw data adopts 128 Hz. The core size of the SVD processor is 580×580 um(2), and the speed of operation frequency is 20MHz. It consumes 0.774mW of power during the 8-channel EEG system per execution time.
Fast generation of computer-generated hologram by graphics processing unit

NASA Astrophysics Data System (ADS)

Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi

2009-02-01

A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Database interfaces on NASA's heterogeneous distributed database system

NASA Technical Reports Server (NTRS)

Huang, Shou-Hsuan Stephen

1987-01-01

The purpose of Distributed Access View Integrated Database (DAVID) interface module (Module 9: Resident Primitive Processing Package) is to provide data transfer between local DAVID systems and resident Data Base Management Systems (DBMSs). The result of current research is summarized. A detailed description of the interface module is provided. Several Pascal templates were constructed. The Resident Processor program was also developed. Even though it is designed for the Pascal templates, it can be modified for templates in other languages, such as C, without much difficulty. The Resident Processor itself can be written in any programming language. Since Module 5 routines are not ready yet, there is no way to test the interface module. However, simulation shows that the data base access programs produced by the Resident Processor do work according to the specifications.

The development of a power spectral density processor for C and L band airborne radar scatterometer sensor systems

NASA Technical Reports Server (NTRS)

Harrison, D. A., III; Chladek, J. T.

1983-01-01

A real-time signal processor was developed for the NASA/JSC L-and C-band airborne radar scatterometer sensor systems. The purpose of the effort was to reduce ground data processing costs. Conversion of two quadrature channels of data (like and cross polarized) was made to obtain Power Spectral Density (PSD) values. A chirp-z transform (CZT) approach was used to filter the Doppler return signal and improved high frequency and angular resolution was realized. The processors have been tested with record signals and excellent results were obtained. CZT filtering can be readily applied to scatterometers operating at other wavelengths by altering the sample frequency. The design of the hardware and software and the results of the performance tests are described in detail.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Method of implementation of optoelectronic multiparametric signal processing systems based on multivalued-logic principles

NASA Astrophysics Data System (ADS)

Arestova, M. L.; Bykovskii, A. Yu

1995-10-01

An architecture is proposed for a specialised optoelectronic multivalued logic processor based on the Allen—Givone algebra. The processor is intended for multiparametric processing of data arriving from a large number of sensors or for tackling spectral analysis tasks. The processor architecture makes it possible to obtain an approximate general estimate of the state of an object being diagnosed on a p-level scale. Optoelectronic systems are proposed for MAXIMUM, MINIMUM, and LITERAL logic gates, based on optical-frequency encoding of logic levels. Corresponding logic gates form a complete set of logic functions in the Allen—Givone algebra.
Full-Authority Fault-Tolerant Electronic Engine Control System for Variable Cycle Engines.

DTIC Science & Technology

1982-04-01

single internally self-checked VLSI micro - processor . The selected configuration is an externally checked pair of com- mercially available...Electronic Engine Control FPMH Failures per Million Hours FTMP Fault Tolerant Multi- Processor FTSC Fault Tolerant Spaceborn Computer GRAMP Generalized...Removal * MTBR Mean Time Between Repair MTTF Mean Time to Failure xiii List of Abbreviations (continued) - NH High Pressure Rotor Speed O&S Operating
Multi-Threaded Algorithms for GPGPU in the ATLAS High Level Trigger

NASA Astrophysics Data System (ADS)

Conde Muíño, P.; ATLAS Collaboration

2017-10-01

General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with Level-1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz Level-1 acceptance rate to 1.5 kHz for recording, requiring an average per-event processing time of ∼ 250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant challenge that will increase significantly with future LHC upgrades. During the LHC data taking period starting in 2021, luminosity will reach up to three times the original design value. Luminosity will increase further to 7.5 times the design value in 2026 following LHC and ATLAS upgrades. Corresponding improvements in the speed of the reconstruction code will be needed to provide the required trigger selection power within affordable computing resources. Key factors determining the potential benefit of including GPGPU as part of the HLT processor farm are: the relative speed of the CPU and GPGPU algorithm implementations; the relative execution times of the GPGPU algorithms and serial code remaining on the CPU; the number of GPGPU required, and the relative financial cost of the selected GPGPU. We give a brief overview of the algorithms implemented and present new measurements that compare the performance of various configurations exploiting GPGPU cards.
NSTAR Ion Thrusters and Power Processors

NASA Technical Reports Server (NTRS)

Bond, T. A.; Christensen, J. A.

1999-01-01

The purpose of the NASA Solar Electric Propulsion Technology Applications Readiness (NSTAR) project is to validate ion propulsion technology for use on future NASA deep space missions. This program, which was initiated in September 1995, focused on the development of two sets of flight quality ion thrusters, power processors, and controllers that provided the same performance as engineering model hardware and also met the dynamic and environmental requirements of the Deep Space 1 Project. One of the flight sets was used for primary propulsion for the Deep Space 1 spacecraft which was launched in October 1998.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Phase coherence adaptive processor for automatic signal detection and identification

NASA Astrophysics Data System (ADS)

Wagstaff, Ronald A.

2006-05-01

A continuously adapting acoustic signal processor with an automatic detection/decision aid is presented. Its purpose is to preserve the signals of tactical interest, and filter out other signals and noise. It utilizes single sensor or beamformed spectral data and transforms the signal and noise phase angles into "aligned phase angles" (APA). The APA increase the phase temporal coherence of signals and leave the noise incoherent. Coherence thresholds are set, which are representative of the type of source "threat vehicle" and the geographic area or volume in which it is operating. These thresholds separate signals, based on the "quality" of their APA coherence. An example is presented in which signals from a submerged source in the ocean are preserved, while clutter signals from ships and noise are entirely eliminated. Furthermore, the "signals of interest" were identified by the processor's automatic detection aid. Similar performance is expected for air and ground vehicles. The processor's equations are formulated in such a manner that they can be tuned to eliminate noise and exploit signal, based on the "quality" of their APA temporal coherence. The mathematical formulation for this processor is presented, including the method by which the processor continuously self-adapts. Results show nearly complete elimination of noise, with only the selected category of signals remaining, and accompanying enhancements in spectral and spatial resolution. In most cases, the concept of signal-to-noise ratio looses significance, and "adaptive automated /decision aid" is more relevant.
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
27 CFR 19.5 - Manufacturing products unfit for beverage use.

Code of Federal Regulations, 2012 CFR

2012-04-01

... qualify as a distilled spirits plant (processor): (1) Medicines, medicinal preparations, food products... for beverage purposes; (4) Laboratory reagents, stains, and dyes that are unfit for use for beverage...
27 CFR 19.5 - Manufacturing products unfit for beverage use.

Code of Federal Regulations, 2013 CFR

2013-04-01

... qualify as a distilled spirits plant (processor): (1) Medicines, medicinal preparations, food products... for beverage purposes; (4) Laboratory reagents, stains, and dyes that are unfit for use for beverage...
27 CFR 19.5 - Manufacturing products unfit for beverage use.

Code of Federal Regulations, 2014 CFR

2014-04-01

... qualify as a distilled spirits plant (processor): (1) Medicines, medicinal preparations, food products... for beverage purposes; (4) Laboratory reagents, stains, and dyes that are unfit for use for beverage...
Performance Models for Split-execution Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S; McCaskey, Alex; Schrock, Jonathan

Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track resource usage. We focus on asymmetric processing models built using conventional CPUs and a family of special-purpose QPUs that employ quantum computing principles. Our performance models account for the translation of a classical optimization problem into the physical representation required by the quantum processor while also accounting for hardwaremore » limitations and conventional processor speed and memory. We conclude that the bottleneck in this split-execution computing system lies at the quantum-classical interface and that the primary time cost is independent of quantum processor behavior.« less
Reproducibility of Mammography Units, Film Processing and Quality Imaging

NASA Astrophysics Data System (ADS)

Gaona, Enrique

2003-09-01

The purpose of this study was to carry out an exploratory survey of the problems of quality control in mammography and processors units as a diagnosis of the current situation of mammography facilities. Measurements of reproducibility, optical density, optical difference and gamma index are included. Breast cancer is the most frequently diagnosed cancer and is the second leading cause of cancer death among women in the Mexican Republic. Mammography is a radiographic examination specially designed for detecting breast pathology. We found that the problems of reproducibility of AEC are smaller than the problems of processors units because almost all processors fall outside of the acceptable variation limits and they can affect the mammography quality image and the dose to breast. Only four mammography units agree with the minimum score established by ACR and FDA for the phantom image.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Integrating a Hypernymic Proposition Interpreter into a Semantic Processor for Biomedical Texts

PubMed Central

Fiszman, Marcelo; Rindflesch, Thomas C.; Kilicoglu, Halil

2003-01-01

Semantic processing provides the potential for producing high quality results in natural language processing (NLP) applications in the biomedical domain. In this paper, we address a specific semantic phenomenon, the hypernymic proposition, and concentrate on integrating the interpretation of such predications into a more general semantic processor in order to improve overall accuracy. A preliminary evaluation assesses the contribution of hypernymic propositions in providing more specific semantic predications and thus improving effectiveness in retrieving treatment propositions in MEDLINE abstracts. Finally, we discuss the generalization of this methodology to additional semantic propositions as well as other types of biomedical texts. PMID:14728170
Parallel algorithms for boundary value problems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Parallel solution of high-order numerical schemes for solving incompressible flows

NASA Technical Reports Server (NTRS)

Milner, Edward J.; Lin, Avi; Liou, May-Fun; Blech, Richard A.

1993-01-01

A new parallel numerical scheme for solving incompressible steady-state flows is presented. The algorithm uses a finite-difference approach to solving the Navier-Stokes equations. The algorithms are scalable and expandable. They may be used with only two processors or with as many processors as are available. The code is general and expandable. Any size grid may be used. Four processors of the NASA LeRC Hypercluster were used to solve for steady-state flow in a driven square cavity. The Hypercluster was configured in a distributed-memory, hypercube-like architecture. By using a 50-by-50 finite-difference solution grid, an efficiency of 74 percent (a speedup of 2.96) was obtained.
Large calculation of the flow over a hypersonic vehicle using a GPU

NASA Astrophysics Data System (ADS)

Elsen, Erich; LeGresley, Patrick; Darve, Eric

2008-12-01

Graphics processing units are capable of impressive computing performance up to 518 Gflops peak performance. Various groups have been using these processors for general purpose computing; most efforts have focussed on demonstrating relatively basic calculations, e.g. numerical linear algebra, or physical simulations for visualization purposes with limited accuracy. This paper describes the simulation of a hypersonic vehicle configuration with detailed geometry and accurate boundary conditions using the compressible Euler equations. To the authors' knowledge, this is the most sophisticated calculation of this kind in terms of complexity of the geometry, the physical model, the numerical methods employed, and the accuracy of the solution. The Navier-Stokes Stanford University Solver (NSSUS) was used for this purpose. NSSUS is a multi-block structured code with a provably stable and accurate numerical discretization which uses a vertex-based finite-difference method. A multi-grid scheme is used to accelerate the solution of the system. Based on a comparison of the Intel Core 2 Duo and NVIDIA 8800GTX, speed-ups of over 40× were demonstrated for simple test geometries and 20× for complex geometries.
Verification of a Proposed Clinical Electroacoustic Test Protocol for Personal Digital Modulation Receivers Coupled to Cochlear Implant Sound Processors.

PubMed

Nair, Erika L; Sousa, Rhonda; Wannagot, Shannon

Guidelines established by the AAA currently recommend behavioral testing when fitting frequency modulated (FM) systems to individuals with cochlear implants (CIs). A protocol for completing electroacoustic measures has not yet been validated for personal FM systems or digital modulation (DM) systems coupled to CI sound processors. In response, some professionals have used or altered the AAA electroacoustic verification steps for fitting FM systems to hearing aids when fitting FM systems to CI sound processors. More recently steps were outlined in a proposed protocol. The purpose of this research is to review and compare the electroacoustic test measures outlined in a 2013 article by Schafer and colleagues in the Journal of the American Academy of Audiology titled "A Proposed Electroacoustic Test Protocol for Personal FM Receivers Coupled to Cochlear Implant Sound Processors" to the AAA electroacoustic verification steps for fitting FM systems to hearing aids when fitting DM systems to CI users. Electroacoustic measures were conducted on 71 CI sound processors and Phonak Roger DM systems using a proposed protocol and an adapted AAA protocol. Phonak's recommended default receiver gain setting was used for each CI sound processor manufacturer and adjusted if necessary to achieve transparency. Electroacoustic measures were conducted on Cochlear and Advanced Bionics (AB) sound processors. In this study, 28 Cochlear Nucleus 5/CP810 sound processors, 26 Cochlear Nucleus 6/CP910 sound processors, and 17 AB Naida CI Q70 sound processors were coupled in various combinations to Phonak Roger DM dedicated receivers (25 Phonak Roger 14 receivers-Cochlear dedicated receiver-and 9 Phonak Roger 17 receivers-AB dedicated receiver) and 20 Phonak Roger Inspiro transmitters. Employing both the AAA and the Schafer et al protocols, electroacoustic measurements were conducted with the Audioscan Verifit in a clinical setting on 71 CI sound processors and Phonak Roger DM systems to determine transparency and verify FM advantage, comparing speech inputs (65 dB SPL) in an effort to achieve equal outputs. If transparency was not achieved at Phonak's recommended default receiver gain, adjustments were made to the receiver gain. The integrity of the signal was monitored with the appropriate manufacturer's monitor earphones. Using the AAA hearing aid protocol, 50 of the 71 CI sound processors achieved transparency, and 59 of the 71 CI sound processors achieved transparency when using the proposed protocol at Phonak's recommended default receiver gain. After the receiver gain was adjusted, 3 of 21 CI sound processors still did not meet transparency using the AAA protocol, and 2 of 12 CI sound processors still did not meet transparency using the Schafer et al proposed protocol. Both protocols were shown to be effective in taking reliable electroacoustic measurements and demonstrate transparency. Both protocols are felt to be clinically feasible and to address the needs of populations that are unable to reliably report regarding the integrity of their personal DM systems. American Academy of Audiology
Monitoring complex detectors: the uSOP approach in the Belle II experiment

NASA Astrophysics Data System (ADS)

Di Capua, F.; Aloisio, A.; Ameli, F.; Anastasio, A.; Branchini, P.; Giordano, R.; Izzo, V.; Tortone, G.

2017-08-01

uSOP is a general purpose single board computer designed for deep embedded applications in control and monitoring of detectors, sensors and complex laboratory equipments. It is based on the AM3358 (1 GHz ARM Cortex A8 processor), equipped with USB and Ethernet interfaces. On-board RAM and solid state storage allows hosting a full LINUX distribution. In this paper we discuss the main aspects of the hardware and software design and the expandable peripheral architecture built around field busses. We report on several applications of uSOP system in the Belle II experiment, presently under construction at KEK (Tsukuba, Japan). In particular we will report the deployment of uSOP in the monitoring system framework of the endcap electromagnetic calorimeter.

Hardware accelerator design for tracking in smart camera

NASA Astrophysics Data System (ADS)

Singh, Sanjay; Dunga, Srinivasa Murali; Saini, Ravi; Mandal, A. S.; Shekhar, Chandra; Vohra, Anil

2011-10-01

Smart Cameras are important components in video analysis. For video analysis, smart cameras needs to detect interesting moving objects, track such objects from frame to frame, and perform analysis of object track in real time. Therefore, the use of real-time tracking is prominent in smart cameras. The software implementation of tracking algorithm on a general purpose processor (like PowerPC) could achieve low frame rate far from real-time requirements. This paper presents the SIMD approach based hardware accelerator designed for real-time tracking of objects in a scene. The system is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA. Resulted frame rate is 30 frames per second for 250x200 resolution video in gray scale.
Application of a distributed network in computational fluid dynamic simulations

NASA Technical Reports Server (NTRS)

Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish

1994-01-01

A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
Implementing Shared Memory Parallelism in MCBEND

NASA Astrophysics Data System (ADS)

Bird, Adam; Long, David; Dobson, Geoff

2017-09-01

MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Splash 2

NASA Technical Reports Server (NTRS)

Arnold, Jeffrey M.; Buell, Duncan A.; Kleinfelder, Walter J.

1993-01-01

Splash 2 is an attached processor system for Sun SPARC 2 workstations that uses Xilinx 4010 Field Programmable Gate Arrays (FPGA's) as its processing elements. The purpose of this paper is to describe Splash 2. The predecessor system, Splash 1, was designed to be used as a systolic processing system. Although it was very successful in that mode, there were many other applications that were not systolic, but which were successful, nonetheless, on Splash 1, or that were not implemented successfully due to one or more architectural limitations, most notably I/O bandwidth and interprocessor communication. Although other uses to increase computational performance have been found for the Xilinx FPGA's that are Splash's processing elements. Splash is unique in its goal to be programmable in a general sense.
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Online data monitoring in the LHCb experiment

NASA Astrophysics Data System (ADS)

Callot, O.; Cherukuwada, S.; Frank, M.; Gaspar, C.; Graziani, G.; Herwijnen, E. v.; Jost, B.; Neufeld, N.; P-Altarelli, M.; Somogyi, P.; Stoica, R.

2008-07-01

The High Level Trigger and Data Acquisition system selects about 2 kHz of events out of the 40 MHz of beam crossings. The selected events are sent to permanent storage for subsequent analysis. In order to ensure the quality of the collected data, identify possible malfunctions of the detector and perform calibration and alignment checks, a small fraction of the accepted events is sent to a monitoring farm, which consists of a few tens of general purpose processors. This contribution introduces the architecture of the data stream splitting mechanism from the storage system to the monitoring farm, where the raw data are analyzed by dedicated tasks. It describes the collaborating software components that are all based on the Gaudi event processing framework.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.

1991-01-01

A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
Bin-Hash Indexing: A Parallel Method for Fast Query Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng

2008-06-27

This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less
A new implementation of the programming system for structural synthesis (PROSSS-2)

NASA Technical Reports Server (NTRS)

Rogers, James L., Jr.

1984-01-01

This new implementation of the PROgramming System for Structural Synthesis (PROSSS-2) combines a general-purpose finite element computer program for structural analysis, a state-of-the-art optimization program, and several user-supplied, problem-dependent computer programs. The results are flexibility of the optimization procedure, organization, and versatility of the formulation of constraints and design variables. The analysis-optimization process results in a minimized objective function, typically the mass. The analysis and optimization programs are executed repeatedly by looping through the system until the process is stopped by a user-defined termination criterion. However, some of the analysis, such as model definition, need only be one time and the results are saved for future use. The user must write some small, simple FORTRAN programs to interface between the analysis and optimization programs. One of these programs, the front processor, converts the design variables output from the optimizer into the suitable format for input into the analyzer. Another, the end processor, retrieves the behavior variables and, optionally, their gradients from the analysis program and evaluates the objective function and constraints and optionally their gradients. These quantities are output in a format suitable for input into the optimizer. These user-supplied programs are problem-dependent because they depend primarily upon which finite elements are being used in the model. PROSSS-2 differs from the original PROSSS in that the optimizer and front and end processors have been integrated into the finite element computer program. This was done to reduce the complexity and increase portability of the system, and to take advantage of the data handling features found in the finite element program.
77 FR 27797 - Request for Certification of Compliance-Rural Industrialization Loan and Grant Program

Federal Register 2010, 2011, 2012, 2013, 2014

2012-05-11

... 4279-2) for the following: Applicant/Location: Samoa Tuna Processors, Inc. Principal Product/Purpose... improvements, replace machinery and equipment and utilities repairs and other upgrades. The plant will be...
27 CFR 19.331 - General.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false General. 19.331 Section 19.331 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU, DEPARTMENT OF THE TREASURY LIQUORS DISTILLED SPIRITS PLANTS Redistillation § 19.331 General. Distillers or processors may...
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

PubMed

Ly, Daniel Le; Chow, Paul

2010-11-01

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
LIBS data analysis using a predictor-corrector based digital signal processor algorithm

NASA Astrophysics Data System (ADS)

Sanders, Alex; Griffin, Steven T.; Robinson, Aaron

2012-06-01

There are many accepted sensor technologies for generating spectra for material classification. Once the spectra are generated, communication bandwidth limitations favor local material classification with its attendant reduction in data transfer rates and power consumption. Transferring sensor technologies such as Cavity Ring-Down Spectroscopy (CRDS) and Laser Induced Breakdown Spectroscopy (LIBS) require effective material classifiers. A result of recent efforts has been emphasis on Partial Least Squares - Discriminant Analysis (PLS-DA) and Principle Component Analysis (PCA). Implementation of these via general purpose computers is difficult in small portable sensor configurations. This paper addresses the creation of a low mass, low power, robust hardware spectra classifier for a limited set of predetermined materials in an atmospheric matrix. Crucial to this is the incorporation of PCA or PLS-DA classifiers into a predictor-corrector style implementation. The system configuration guarantees rapid convergence. Software running on multi-core Digital Signal Processor (DSPs) simulates a stream-lined plasma physics model estimator, reducing Analog-to-Digital (ADC) power requirements. This paper presents the results of a predictorcorrector model implemented on a low power multi-core DSP to perform substance classification. This configuration emphasizes the hardware system and software design via a predictor corrector model that simultaneously decreases the sample rate while performing the classification.
Multimedia architectures: from desktop systems to portable appliances

NASA Astrophysics Data System (ADS)

Bhaskaran, Vasudev; Konstantinides, Konstantinos; Natarajan, Balas R.

1997-01-01

Future desktop and portable computing systems will have as their core an integrated multimedia system. Such a system will seamlessly combine digital video, digital audio, computer animation, text, and graphics. Furthermore, such a system will allow for mixed-media creation, dissemination, and interactive access in real time. Multimedia architectures that need to support these functions have traditionally required special display and processing units for the different media types. This approach tends to be expensive and is inefficient in its use of silicon. Furthermore, such media-specific processing units are unable to cope with the fluid nature of the multimedia market wherein the needs and standards are changing and system manufacturers may demand a single component media engine across a range of products. This constraint has led to a shift towards providing a single-component multimedia specific computing engine that can be integrated easily within desktop systems, tethered consumer appliances, or portable appliances. In this paper, we review some of the recent architectural efforts in developing integrated media systems. We primarily focus on two efforts, namely the evolution of multimedia-capable general purpose processors and a more recent effort in developing single component mixed media co-processors. Design considerations that could facilitate the migration of these technologies to a portable integrated media system also are presented.
Potential medical applications of TAE

NASA Technical Reports Server (NTRS)

Fahy, J. Ben; Kaucic, Robert; Kim, Yongmin

1986-01-01

In cooperation with scientists in the University of Washington Medical School, a microcomputer-based image processing system for quantitative microscopy, called DMD1 (Digital Microdensitometer 1) was constructed. In order to make DMD1 transportable to different hosts and image processors, we have been investigating the possibility of rewriting the lower level portions of DMD1 software using Transportable Applications Executive (TAE) libraries and subsystems. If successful, we hope to produce a newer version of DMD1, called DMD2, running on an IBM PC/AT under the SCO XENIX System 5 operating system, using any of seven target image processors available in our laboratory. Following this implementation, copies of the system will be transferred to other laboratories with biomedical imaging applications. By integrating those applications into DMD2, we hope to eventually expand our system into a low-cost general purpose biomedical imaging workstation. This workstation will be useful not only as a self-contained instrument for clinical or research applications, but also as part of a large scale Digital Imaging Network and Picture Archiving and Communication System, (DIN/PACS). Widespread application of these TAE-based image processing and analysis systems should facilitate software exchange and scientific cooperation not only within the medical community, but between the medical and remote sensing communities as well.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Multinode reconfigurable pipeline computer

NASA Technical Reports Server (NTRS)

Nosenchuck, Daniel M. (Inventor); Littman, Michael G. (Inventor)

1989-01-01

A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.
An optical processor for object recognition and tracking

NASA Technical Reports Server (NTRS)

Sloan, J.; Udomkesmalee, S.

1987-01-01

The design and development of a miniaturized optical processor that performs real time image correlation are described. The optical correlator utilizes the Vander Lugt matched spatial filter technique. The correlation output, a focused beam of light, is imaged onto a CMOS photodetector array. In addition to performing target recognition, the device also tracks the target. The hardware, composed of optical and electro-optical components, occupies only 590 cu cm of volume. A complete correlator system would also include an input imaging lens. This optical processing system is compact, rugged, requires only 3.5 watts of operating power, and weighs less than 3 kg. It represents a major achievement in miniaturizing optical processors. When considered as a special-purpose processing unit, it is an attractive alternative to conventional digital image recognition processing. It is conceivable that the combined technology of both optical and ditital processing could result in a very advanced robot vision system.
Experimental system for computer network via satellite /CS/. III - Network control processor

NASA Astrophysics Data System (ADS)

Kakinuma, Y.; Ito, A.; Takahashi, H.; Uchida, K.; Matsumoto, K.; Mitsudome, H.

1982-03-01

A network control processor (NCP) has the functions of generating traffics, the control of links and the control of transmitting bursts. The NCP executes protocols, monitors of experiments, gathering and compiling data of measurements, of which programs are loaded on a minicomputer (MELCOM 70/40) with 512KB of memories. The NCP acts as traffic generators, instead of a host computer, in the experiment. For this purpose, 15 fake stations are realized by the software in each user station. This paper describes the configuration of the NCP and the implementation of the protocols for the experimental system.
Superelement model based parallel algorithm for vehicle dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agrawal, O.P.; Danhof, K.J.; Kumar, R.

1994-05-01

This paper presents a superelement model based parallel algorithm for a planar vehicle dynamics. The vehicle model is made up of a chassis and two suspension systems each of which consists of an axle-wheel assembly and two trailing arms. In this model, the chassis is treated as a Cartesian element and each suspension system is treated as a superelement. The parameters associated with the superelements are computed using an inverse dynamics technique. Suspension shock absorbers and the tires are modeled by nonlinear springs and dampers. The Euler-Lagrange approach is used to develop the system equations of motion. This leads tomore » a system of differential and algebraic equations in which the constraints internal to superelements appear only explicitly. The above formulation is implemented on a multiprocessor machine. The numerical flow chart is divided into modules and the computation of several modules is performed in parallel to gain computational efficiency. In this implementation, the master (parent processor) creates a pool of slaves (child processors) at the beginning of the program. The slaves remain in the pool until they are needed to perform certain tasks. Upon completion of a particular task, a slave returns to the pool. This improves the overall response time of the algorithm. The formulation presented is general which makes it attractive for a general purpose code development. Speedups obtained in the different modules of the dynamic analysis computation are also presented. Results show that the superelement model based parallel algorithm can significantly reduce the vehicle dynamics simulation time. 52 refs.« less

Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2011-01-11

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-02-21

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Data processing techniques used with MST radars: A review

NASA Technical Reports Server (NTRS)

Rastogi, P. K.

1983-01-01

The data processing methods used in high power radar probing of the middle atmosphere are examined. The radar acts as a spatial filter on the small scale refractivity fluctuations in the medium. The characteristics of the received signals are related to the statistical properties of these fluctuations. A functional outline of the components of a radar system is given. Most computation intensive tasks are carried out by the processor. The processor computes a statistical function of the received signals, simultaneously for a large number of ranges. The slow fading of atmospheric signals is used to reduce the data input rate to the processor by coherent integration. The inherent range resolution of the radar experiments can be improved significant with the use of pseudonoise phase codes to modulate the transmitted pulses and a corresponding decoding operation on the received signals. Commutability of the decoding and coherent integration operations is used to obtain a significant reduction in computations. The limitations of the processors are outlined. At the next level of data reduction, the measured function is parameterized by a few spectral moments that can be related to physical processes in the medium. The problems encountered in estimating the spectral moments in the presence of strong ground clutter, external interference, and noise are discussed. The graphical and statistical analysis of the inferred parameters are outlined. The requirements for special purpose processors for MST radars are discussed.
General linear codes for fault-tolerant matrix operations on processor arrays

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Abraham, J. A.

1988-01-01

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
Geospace simulations on the Cell BE processor

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D.

2008-12-01

OpenGGCM (Open Geospace General circulation Model) is an established numerical code that simulates the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is limited by computational constraints on grid resolution. We investigate porting of the MHD solver to the Cell BE architecture, a novel inhomogeneous multicore architecture capable of up to 230 GFlops per processor. Realizing this high performance on the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallel approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the vector/SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We obtained excellent performance numbers, a speed-up of a factor of 25 compared to just using the main processor, while still keeping the numerical implementation details of the code maintainable.
Move Over, Word Processors--Here Come the Databases.

ERIC Educational Resources Information Center

Olds, Henry F., Jr.; Dickenson, Anne

1985-01-01

Discusses the use of beginning, intermediate, and advanced databases for instructional purposes. A table listing seven databases with information on ease of use, smoothness of operation, data capacity, speed, source, and program features is included. (JN)
Implementing An Image Understanding System Architecture Using Pipe

NASA Astrophysics Data System (ADS)

Luck, Randall L.

1988-03-01

This paper will describe PIPE and how it can be used to implement an image understanding system. Image understanding is the process of developing a description of an image in order to make decisions about its contents. The tasks of image understanding are generally split into low level vision and high level vision. Low level vision is performed by PIPE -a high performance parallel processor with an architecture specifically designed for processing video images at up to 60 fields per second. High level vision is performed by one of several types of serial or parallel computers - depending on the application. An additional processor called ISMAP performs the conversion from iconic image space to symbolic feature space. ISMAP plugs into one of PIPE's slots and is memory mapped into the high level processor. Thus it forms the high speed link between the low and high level vision processors. The mechanisms for bottom-up, data driven processing and top-down, model driven processing are discussed.
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
VENTURE/PC manual: A multidimensional multigroup neutron diffusion code system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shapiro, A.; Huria, H.C.; Cho, K.W.

1991-12-01

VENTURE/PC is a recompilation of part of the Oak Ridge BOLD VENTURE code system, which will operate on an IBM PC or compatible computer. Neutron diffusion theory solutions are obtained for multidimensional, multigroup problems. This manual contains information associated with operating the code system. The purpose of the various modules used in the code system, and the input for these modules are discussed. The PC code structure is also given. Version 2 included several enhancements not given in the original version of the code. In particular, flux iterations can be done in core rather than by reading and writing tomore » disk, for problems which allow sufficient memory for such in-core iterations. This speeds up the iteration process. Version 3 does not include any of the special processors used in the previous versions. These special processors utilized formatted input for various elements of the code system. All such input data is now entered through the Input Processor, which produces standard interface files for the various modules in the code system. In addition, a Standard Interface File Handbook is included in the documentation which is distributed with the code, to assist in developing the input for the Input Processor.« less
A Electro-Optical Image Algebra Processing System for Automatic Target Recognition

NASA Astrophysics Data System (ADS)

Coffield, Patrick Cyrus

The proposed electro-optical image algebra processing system is designed specifically for image processing and other related computations. The design is a hybridization of an optical correlator and a massively paralleled, single instruction multiple data processor. The architecture of the design consists of three tightly coupled components: a spatial configuration processor (the optical analog portion), a weighting processor (digital), and an accumulation processor (digital). The systolic flow of data and image processing operations are directed by a control buffer and pipelined to each of the three processing components. The image processing operations are defined in terms of basic operations of an image algebra developed by the University of Florida. The algebra is capable of describing all common image-to-image transformations. The merit of this architectural design is how it implements the natural decomposition of algebraic functions into spatially distributed, point use operations. The effect of this particular decomposition allows convolution type operations to be computed strictly as a function of the number of elements in the template (mask, filter, etc.) instead of the number of picture elements in the image. Thus, a substantial increase in throughput is realized. The implementation of the proposed design may be accomplished in many ways. While a hybrid electro-optical implementation is of primary interest, the benefits and design issues of an all digital implementation are also discussed. The potential utility of this architectural design lies in its ability to control a large variety of the arithmetic and logic operations of the image algebra's generalized matrix product. The generalized matrix product is the most powerful fundamental operation in the algebra, thus allowing a wide range of applications. No other known device or design has made this claim of processing speed and general implementation of a heterogeneous image algebra.
Communications and Information: Compendium of Communications and Information Terminology

DTIC Science & Technology

2002-02-01

Basic Access Module BASIC— Beginners All-Purpose Symbolic Instruction Code BBP—Baseband Processor BBS—Bulletin Board Service (System) BBTC—Broadband...media, formats and labels, programming language, computer documentation, flowcharts and terminology, character codes, data communications and input
28 CFR 25.56 - Responsibilities of junk yards and salvage yards and auto recyclers.

Code of Federal Regulations, 2011 CFR

2011-07-01

... of whether the automobile was crushed or disposed of, for sale or other purposes, to whom it was.... (h) Scrap metal processors and shredders that receive automobiles for recycling where the condition...
28 CFR 25.56 - Responsibilities of junk yards and salvage yards and auto recyclers.

Code of Federal Regulations, 2012 CFR

2012-07-01

... of whether the automobile was crushed or disposed of, for sale or other purposes, to whom it was.... (h) Scrap metal processors and shredders that receive automobiles for recycling where the condition...
28 CFR 25.56 - Responsibilities of junk yards and salvage yards and auto recyclers.

Code of Federal Regulations, 2014 CFR

2014-07-01

... of whether the automobile was crushed or disposed of, for sale or other purposes, to whom it was.... (h) Scrap metal processors and shredders that receive automobiles for recycling where the condition...
28 CFR 25.56 - Responsibilities of junk yards and salvage yards and auto recyclers.

Code of Federal Regulations, 2013 CFR

2013-07-01

... of whether the automobile was crushed or disposed of, for sale or other purposes, to whom it was.... (h) Scrap metal processors and shredders that receive automobiles for recycling where the condition...
28 CFR 25.56 - Responsibilities of junk yards and salvage yards and auto recyclers.

Code of Federal Regulations, 2010 CFR

2010-07-01

... of whether the automobile was crushed or disposed of, for sale or other purposes, to whom it was.... (h) Scrap metal processors and shredders that receive automobiles for recycling where the condition...
Baseband processor development/test performance for 30/20 GHz SS-TDMA communication system

NASA Technical Reports Server (NTRS)

Brown, L.; Sabourin, D.; Attwood, S.

1984-01-01

The baseband processor (BBP) development for the 30/20 GHz Satellite Communication System is described. The SS-TDMA concept for future satellite communications is reviewed, describing the overall system, the satellite payload, and the frequency plan. A brief general description of the BBP is given, and the proof-of-concept model of the BBP is summarized. Key technologies and custom LSI developed for the BBP are listed. Finally, key technology developments and test data are reported for the BBP.
Design and initial application of the extended aircraft interrogation and display system: Multiprocessing ground support equipment for digital flight systems

NASA Technical Reports Server (NTRS)

Glover, Richard D.

1987-01-01

A pipelined, multiprocessor, general-purpose ground support equipment for digital flight systems has been developed and placed in service at the NASA Ames Research Center's Dryden Flight Research Facility. The design is an outgrowth of the earlier aircraft interrogation and display system (AIDS) used in support of several research projects to provide engineering-units display of internal control system parameters during development and qualification testing activities. The new system, incorporating multiple 16-bit processors, is called extended AIDS (XAIDS) and is now supporting the X-29A forward-swept-wing aircraft project. This report describes the design and mechanization of XAIDS and shows the steps whereby a typical user may take advantage of its high throughput and flexible features.
Hardware accelerator design for change detection in smart camera

NASA Astrophysics Data System (ADS)

Singh, Sanjay; Dunga, Srinivasa Murali; Saini, Ravi; Mandal, A. S.; Shekhar, Chandra; Chaudhury, Santanu; Vohra, Anil

2011-10-01

Smart Cameras are important components in Human Computer Interaction. In any remote surveillance scenario, smart cameras have to take intelligent decisions to select frames of significant changes to minimize communication and processing overhead. Among many of the algorithms for change detection, one based on clustering based scheme was proposed for smart camera systems. However, such an algorithm could achieve low frame rate far from real-time requirements on a general purpose processors (like PowerPC) available on FPGAs. This paper proposes the hardware accelerator capable of detecting real time changes in a scene, which uses clustering based change detection scheme. The system is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA board. Resulted frame rate is 30 frames per second for QVGA resolution in gray scale.
Web-based multi-channel analyzer

DOEpatents

Gritzo, Russ E.

2003-12-23

The present invention provides an improved multi-channel analyzer designed to conveniently gather, process, and distribute spectrographic pulse data. The multi-channel analyzer may operate on a computer system having memory, a processor, and the capability to connect to a network and to receive digitized spectrographic pulses. The multi-channel analyzer may have a software module integrated with a general-purpose operating system that may receive digitized spectrographic pulses for at least 10,000 pulses per second. The multi-channel analyzer may further have a user-level software module that may receive user-specified controls dictating the operation of the multi-channel analyzer, making the multi-channel analyzer customizable by the end-user. The user-level software may further categorize and conveniently distribute spectrographic pulse data employing non-proprietary, standard communication protocols and formats.

An application of software design and documentation language. [Galileo spacecraft command and data subsystem

NASA Technical Reports Server (NTRS)

Callender, E. D.; Clarkson, T. B.; Frasier, C. E.

1980-01-01

The software design and documentation language (SDDL) is a general purpose processor to support a lanugage for the description of any system, structure, concept, or procedure that may be presented from the viewpoint of a collection of hierarchical entities linked together by means of binary connections. The language comprises a set of rules of syntax, primitive construct classes (module, block, and module invocation), and language control directives. The result is a language with a fixed grammar, variable alphabet and punctuation, and an extendable vocabulary. The application of SDDL to the detailed software design of the Command Data Subsystem for the Galileo Spacecraft is discussed. A set of constructs was developed and applied. These constructs are evaluated and examples of their application are considered.
14- by 22-Foot Subsonic Tunnel Laser Velocimeter Upgrade

NASA Technical Reports Server (NTRS)

Meyers, James F.; Lee, Joseph W.; Cavone, Angelo A.; Fletcher, Mark T.

2012-01-01

A long-focal length laser velocimeter constructed in the early 1980's was upgraded using current technology to improve usability, reliability and future serviceability. The original, free-space optics were replaced with a state-of-the-art fiber-optic subsystem which allowed most of the optics, including the laser, to be remote from the harsh tunnel environment. General purpose high-speed digitizers were incorporated in a standard modular data acquisition system, along with custom signal processing software executed on a desktop computer, served as the replacement for the signal processors. The resulting system increased optical sensitivity with real-time signal/data processing that produced measurement precisions exceeding those of the original system. Monte Carlo simulations, along with laboratory and wind tunnel investigations were used to determine system characteristics and measurement precision.
The Modernization of a Long-Focal Length Fringe-Type Laser Velocimeter

NASA Technical Reports Server (NTRS)

Meyers, James F.; Lee, Joseph W.; Cavone, Angelo A.; Fletcher, Mark T.

2012-01-01

A long-focal length laser velocimeter constructed in the early 1980's was upgraded using current technology to improve usability, reliability and future serviceability. The original, free-space optics were replaced with a state-of-the-art fiber-optic subsystem which allowed most of the optics, including the laser, to be remote from the harsh tunnel environment. General purpose high-speed digitizers were incorporated in a standard modular data acquisition system, along with custom signal processing software executed on a desktop computer, served as the replacement for the signal processors. The resulting system increased optical sensitivity with real-time signal/data processing that produced measurement precisions exceeding those of the original system. Monte Carlo simulations, along with laboratory and wind tunnel investigations were used to determine system characteristics and measurement precision.
JTRS/SCA and Custom/SDR Waveform Comparison

NASA Technical Reports Server (NTRS)

Oldham, Daniel R.; Scardelletti, Maximilian C.

2007-01-01

This paper compares two waveform implementations generating the same RF signal using the same SDR development system. Both waveforms implement a satellite modem using QPSK modulation at 1M BPS data rates with one half rate convolutional encoding. Both waveforms are partitioned the same across the general purpose processor (GPP) and the field programmable gate array (FPGA). Both waveforms implement the same equivalent set of radio functions on the GPP and FPGA. The GPP implements the majority of the radio functions and the FPGA implements the final digital RF modulator stage. One waveform is implemented directly on the SDR development system and the second waveform is implemented using the JTRS/SCA model. This paper contrasts the amount of resources to implement both waveforms and demonstrates the importance of waveform partitioning across the SDR development system.
QCE: A Simulator for Quantum Computer Hardware

NASA Astrophysics Data System (ADS)

Michielsen, Kristel; de Raedt, Hans

2003-09-01

The Quantum Computer Emulator (QCE) described in this paper consists of a simulator of a generic, general purpose quantum computer and a graphical user interface. The latter is used to control the simulator, to define the hardware of the quantum computer and to debug and execute quantum algorithms. QCE runs in a Windows 98/NT/2000/ME/XP environment. It can be used to validate designs of physically realizable quantum processors and as an interactive educational tool to learn about quantum computers and quantum algorithms. A detailed exposition is given of the implementation of the CNOT and the Toffoli gate, the quantum Fourier transform, Grover's database search algorithm, an order finding algorithm, Shor's algorithm, a three-input adder and a number partitioning algorithm. We also review the results of simulations of an NMR-like quantum computer.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Multi-level Hierarchical Poly Tree computer architectures

NASA Technical Reports Server (NTRS)

Padovan, Joe; Gute, Doug

1990-01-01

Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Optical systolic solutions of linear algebraic equations

NASA Technical Reports Server (NTRS)

Neuman, C. P.; Casasent, D.

1984-01-01

The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Optimistic barrier synchronization

NASA Technical Reports Server (NTRS)

Nicol, David M.

1992-01-01

Barrier synchronization is fundamental operation in parallel computation. In many contexts, at the point a processor enters a barrier it knows that it has already processed all the work required of it prior to synchronization. The alternative case, when a processor cannot enter a barrier with the assurance that it has already performed all the necessary pre-synchronization computation, is treated. The problem arises when the number of pre-sychronization messages to be received by a processor is unkown, for example, in a parallel discrete simulation or any other computation that is largely driven by an unpredictable exchange of messages. We describe an optimistic O(log sup 2 P) barrier algorithm for such problems, study its performance on a large-scale parallel system, and consider extensions to general associative reductions as well as associative parallel prefix computations.
Simplifying and speeding the management of intra-node cache coherence

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-04-17

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A; Chen, Dong; Coteus, Paul W

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Special-purpose computing for dense stellar systems

NASA Astrophysics Data System (ADS)

Makino, Junichiro

2007-08-01

I'll describe the current status of the GRAPE-DR project. The GRAPE-DR is the next-generation hardware for N-body simulation. Unlike the previous GRAPE hardwares, it is programmable SIMD machine with a large number of simple processors integrated into a single chip. The GRAPE-DR chip consists of 512 simple processors and operates at the clock speed of 500 MHz, delivering the theoretical peak speed of 512/226 Gflops (single/double precision). As of August 2006, the first prototype board with the sample chip successfully passed the test we prepared. The full GRAPE-DR system will consist of 4096 chips, reaching the theoretical peak speed of 2 Pflops.
Complete all-optical processing polarization-based binary logic gates and optical processors.

PubMed

Zaghloul, Y A; Zaghloul, A R M

2006-10-16

We present a complete all-optical-processing polarization-based binary-logic system, by which any logic gate or processor can be implemented. Following the new polarization-based logic presented in [Opt. Express 14, 7253 (2006)], we develop a new parallel processing technique that allows for the creation of all-optical-processing gates that produce a unique output either logic 1 or 0 only once in a truth table, and those that do not. This representation allows for the implementation of simple unforced OR, AND, XOR, XNOR, inverter, and more importantly NAND and NOR gates that can be used independently to represent any Boolean expression or function. In addition, the concept of a generalized gate is presented which opens the door for reconfigurable optical processors and programmable optical logic gates. Furthermore, the new design is completely compatible with the old one presented in [Opt. Express 14, 7253 (2006)], and with current semiconductor based devices. The gates can be cascaded, where the information is always on the laser beam. The polarization of the beam, and not its intensity, carries the information. The new methodology allows for the creation of multiple-input-multiple-output processors that implement, by itself, any Boolean function, such as specialized or non-specialized microprocessors. Three all-optical architectures are presented: orthoparallel optical logic architecture for all known and unknown binary gates, singlebranch architecture for only XOR and XNOR gates, and the railroad (RR) architecture for polarization optical processors (POP). All the control inputs are applied simultaneously leading to a single time lag which leads to a very-fast and glitch-immune POP. A simple and easy-to-follow step-by-step algorithm is provided for the POP, and design reduction methodologies are briefly discussed. The algorithm lends itself systematically to software programming and computer-assisted design. As examples, designs of all binary gates, multiple-input gates, and sequential and non-sequential Boolean expressions are presented and discussed. The operation of each design is simply understood by a bullet train traveling at the speed of light on a railroad system preconditioned by the crossover states predetermined by the control inputs. The presented designs allow for optical processing of the information eliminating the need to convert it, back and forth, to an electronic signal for processing purposes. All gates with a truth table, including for example Fredkin, Toffoli, testable reversible logic, and threshold logic gates, can be designed and implemented using the railroad architecture. That includes any future gates not known today. Those designs and the quantum gates are not discussed in this paper.
Satellite on-board real-time SAR processor prototype

NASA Astrophysics Data System (ADS)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and size are reviewed.
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
A Parallel Pipelined Renderer for the Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Chiueh, Tzi-Cker; Ma, Kwan-Liu

1997-01-01

This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Generalized Nanosatellite Avionics Testbed Lab

NASA Technical Reports Server (NTRS)

Frost, Chad R.; Sorgenfrei, Matthew C.; Nehrenz, Matt

2015-01-01

The Generalized Nanosatellite Avionics Testbed (G-NAT) lab at NASA Ames Research Center provides a flexible, easily accessible platform for developing hardware and software for advanced small spacecraft. A collaboration between the Mission Design Division and the Intelligent Systems Division, the objective of the lab is to provide testing data and general test protocols for advanced sensors, actuators, and processors for CubeSat-class spacecraft. By developing test schemes for advanced components outside of the standard mission lifecycle, the lab is able to help reduce the risk carried by advanced nanosatellite or CubeSat missions. Such missions are often allocated very little time for testing, and too often the test facilities must be custom-built for the needs of the mission at hand. The G-NAT lab helps to eliminate these problems by providing an existing suite of testbeds that combines easily accessible, commercial-offthe- shelf (COTS) processors with a collection of existing sensors and actuators.
Special-purpose computer for holography HORN-4 with recurrence algorithm

NASA Astrophysics Data System (ADS)

Shimobaba, Tomoyoshi; Hishinuma, Sinsuke; Ito, Tomoyoshi

2002-10-01

We designed and built a special-purpose computer for holography, HORN-4 (HOlographic ReconstructioN) using PLD (Programmable Logic Device) technology. HORN computers have a pipeline architecture. We use HORN-4 as an attached processor to enhance the performance of a general-purpose computer when it is used to generate holograms using a "recurrence formulas" algorithm developed by our previous paper. In the HORN-4 system, we designed the pipeline by adopting our "recurrence formulas" algorithm which can calculate the phase on a hologram. As the result, we could integrate the pipeline composed of 21 units into one PLD chip. The units in the pipeline consists of one BPU (Basic Phase Unit) unit and twenty CU (Cascade Unit) units. These CU units can compute twenty light intensities on a hologram plane at one time. By mounting two of the PLD chips on a PCI (Peripheral Component Interconnect) universal board, HORN-4 can calculate holograms at high speed of about 42 Gflops equivalent. The cost of HORN-4 board is about 1700 US dollar. We could obtain 800×600 grids hologram from a 3D-image composed of 415 points in about 0.45 sec with the HORN-4 system.
Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

NASA Astrophysics Data System (ADS)

Sewell, Stephen

This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
40 CFR 766.1 - Scope and purpose.

Code of Federal Regulations, 2010 CFR

2010-07-01

... ascertain whether certain specified chemical substances may be contaminated with halogenated dibenzodioxins... TSCA, 15 U.S.C. 2607. (b) Section 766.35(b) requires manufacturers and processors of chemical... chemical substances for concentrations of HDDs/HDFs, applicable protocols, and the results of the analysis...

Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Wallack, A.S.; Canny, J.

1996-08-13

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 27 figs.
Control of automated behavior: insights from the discrete sequence production task

PubMed Central

Abrahamse, Elger L.; Ruitenberg, Marit F. L.; de Kleine, Elian; Verwey, Willem B.

2013-01-01

Work with the discrete sequence production (DSP) task has provided a substantial literature on discrete sequencing skill over the last decades. The purpose of the current article is to provide a comprehensive overview of this literature and of the theoretical progress that it has prompted. We start with a description of the DSP task and the phenomena that are typically observed with it. Then we propose a cognitive model, the dual processor model (DPM), which explains performance of (skilled) discrete key-press sequences. Key features of this model are the distinction between a cognitive processor and a motor system (i.e., motor buffer and motor processor), the interplay between these two processing systems, and the possibility to execute familiar sequences in two different execution modes. We further discuss how this model relates to several related sequence skill research paradigms and models, and we outline outstanding questions for future research throughout the paper. We conclude by sketching a tentative neural implementation of the DPM. PMID:23515430
Carbon Dioxide Reduction Post-Processing Sub-System Development

NASA Technical Reports Server (NTRS)

Abney, Morgan B.; Miller, Lee A.; Greenwood, Zachary; Barton, Katherine

2012-01-01

The state-of-the-art Carbon Dioxide (CO2) Reduction Assembly (CRA) on the International Space Station (ISS) facilitates the recovery of oxygen from metabolic CO2. The CRA utilizes the Sabatier process to produce water with methane as a byproduct. The methane is currently vented overboard as a waste product. Because the CRA relies on hydrogen for oxygen recovery, the loss of methane ultimately results in a loss of oxygen. For missions beyond low earth orbit, it will prove essential to maximize oxygen recovery. For this purpose, NASA is exploring an integrated post-processor system to recover hydrogen from CRA methane. The post-processor, called a Plasma Pyrolysis Assembly (PPA) partially pyrolyzes methane to recover hydrogen with acetylene as a byproduct. In-flight operation of post-processor will require a Methane Purification Assembly (MePA) and an Acetylene Separation Assembly (ASepA). Recent efforts have focused on the design, fabrication, and testing of these components. The results and conclusions of these efforts will be discussed as well as future plans.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Canny, John; Wallack, Aaron S.

1999-01-01

Methods and apparatus are provided for developing a complete set of all admissible Type I and Type II fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Wallack, Aaron S.; Canny, John

1996-01-01

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Canny, J.; Wallack, A.S.

1999-01-05

Methods and apparatus are provided for developing a complete set of all admissible Type 1 and Type 2 fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 44 figs.
The Alaska SAR processor - Operations and control

NASA Technical Reports Server (NTRS)

Carande, Richard E.

1989-01-01

The Alaska SAR (synthetic-aperture radar) Facility (ASF) will be capable of receiving, processing, archiving, and producing a variety of SAR image products from three satellite-borne SARs: E-ERS-1 (ESA), J-ERS-1 (NASDA) and Radarsat (Canada). Crucial to the success of the ASF is the Alaska SAR processor (ASP), which will be capable of processing over 200 100-km x 100-km (Seasat-like) frames per day from the raw SAR data, at a ground resolution of about 30 m x 30 m. The processed imagery is of high geometric and radiometric accuracy, and is geolocated to within 500 m. Special-purpose hardware has been designed to execute a SAR processing algorithm to achieve this performance. This hardware is currently undergoing acceptance testing for delivery to the University of Alaska. Particular attention has been devoted to making the operations semi-automated and to providing a friendly operator interface via a computer workstation. The operations and control of the Alaska SAR processor are described.
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology

NASA Astrophysics Data System (ADS)

Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.

2009-04-01

Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Processor core for real time background identification of HD video based on OpenCV Gaussian mixture model algorithm

NASA Astrophysics Data System (ADS)

Genovese, Mariangela; Napoli, Ettore

2013-05-01

The identification of moving objects is a fundamental step in computer vision processing chains. The development of low cost and lightweight smart cameras steadily increases the request of efficient and high performance circuits able to process high definition video in real time. The paper proposes two processor cores aimed to perform the real time background identification on High Definition (HD, 1920 1080 pixel) video streams. The implemented algorithm is the OpenCV version of the Gaussian Mixture Model (GMM), an high performance probabilistic algorithm for the segmentation of the background that is however computationally intensive and impossible to implement on general purpose CPU with the constraint of real time processing. In the proposed paper, the equations of the OpenCV GMM algorithm are optimized in such a way that a lightweight and low power implementation of the algorithm is obtained. The reported performances are also the result of the use of state of the art truncated binary multipliers and ROM compression techniques for the implementation of the non-linear functions. The first circuit has commercial FPGA devices as a target and provides speed and logic resource occupation that overcome previously proposed implementations. The second circuit is oriented to an ASIC (UMC-90nm) standard cell implementation. Both implementations are able to process more than 60 frames per second in 1080p format, a frame rate compatible with HD television.
Using Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

NASA Astrophysics Data System (ADS)

O'Connor, A. S.; Justice, B.; Harris, A. T.

2013-12-01

Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
Performance of Distributed CFAR Processors in Pearson Distributed Clutter

NASA Astrophysics Data System (ADS)

Messali, Zoubeida; Soltani, Faouzi

2006-12-01

This paper deals with the distributed constant false alarm rate (CFAR) radar detection of targets embedded in heavy-tailed Pearson distributed clutter. In particular, we extend the results obtained for the cell averaging (CA), order statistics (OS), and censored mean level CMLD CFAR processors operating in positive alpha-stable (P&S) random variables to more general situations, specifically to the presence of interfering targets and distributed CFAR detectors. The receiver operating characteristics of the greatest of (GO) and the smallest of (SO) CFAR processors are also determined. The performance characteristics of distributed systems are presented and compared in both homogeneous and in presence of interfering targets. We demonstrate, via simulation results, that the distributed systems when the clutter is modelled as positive alpha-stable distribution offer robustness properties against multiple target situations especially when using the "OR" fusion rule.
VENTURE/PC manual: A multidimensional multigroup neutron diffusion code system. Version 3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shapiro, A.; Huria, H.C.; Cho, K.W.

1991-12-01

VENTURE/PC is a recompilation of part of the Oak Ridge BOLD VENTURE code system, which will operate on an IBM PC or compatible computer. Neutron diffusion theory solutions are obtained for multidimensional, multigroup problems. This manual contains information associated with operating the code system. The purpose of the various modules used in the code system, and the input for these modules are discussed. The PC code structure is also given. Version 2 included several enhancements not given in the original version of the code. In particular, flux iterations can be done in core rather than by reading and writing tomore » disk, for problems which allow sufficient memory for such in-core iterations. This speeds up the iteration process. Version 3 does not include any of the special processors used in the previous versions. These special processors utilized formatted input for various elements of the code system. All such input data is now entered through the Input Processor, which produces standard interface files for the various modules in the code system. In addition, a Standard Interface File Handbook is included in the documentation which is distributed with the code, to assist in developing the input for the Input Processor.« less
An executable specification for the message processor in a simple combining network

NASA Technical Reports Server (NTRS)

Middleton, David

1995-01-01

While the primary function of the network in a parallel computer is to communicate data between processors, it is often useful if the network can also perform rudimentary calculations. That is, some simple processing ability in the network itself, particularly for performing parallel prefix computations, can reduce both the volume of data being communicated and the computational load on the processors proper. Unfortunately, typical implementations of such networks require a large fraction of the hardware budget, and so combining networks are viewed as being impractical. The FFP Machine has such a combining network, and various characteristics of the machine allow a good deal of simplification in the network design. Despite being simple in construction however, the network relies on many subtle details to work correctly. This paper describes an executable model of the network which will serve several purposes. It provides a complete and detailed description of the network which can substantiate its ability to support necessary functions. It provides an environment in which algorithms to be run on the network can be designed and debugged more easily than they would on physical hardware. Finally, it provides the foundation for exploring the design of the message receiving facility which connects the network to the individual processors.
Scalable Unix tools on parallel processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, W.; Lusk, E.

1994-12-31

The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.
Ways to estimate speeds for the purposes of air quality conformity analyses.

DOT National Transportation Integrated Search

2002-01-01

A speed post-processor refers to equations or lookup tables that can determine vehicle speeds on a particular roadway link using only the limited information available in a long-range planning model. An estimated link speed is usually based on volume...
Cochlear Implant Microphone Location Affects Speech Recognition in Diffuse Noise

PubMed Central

Kolberg, Elizabeth R.; Sheffield, Sterling W.; Davis, Timothy J.; Sunderhaus, Linsey W.; Gifford, René H.

2015-01-01

Background Despite improvements in cochlear implants (CIs), CI recipients continue to experience significant communicative difficulty in background noise. Many potential solutions have been proposed to help increase signal-to-noise ratio in noisy environments, including signal processing and external accessories. To date, however, the effect of microphone location on speech recognition in noise has focused primarily on hearing aid users. Purpose The purpose of this study was to (1) measure physical output for the T-Mic as compared with the integrated behind-the-ear(BTE) processor mic for various source azimuths, and (2) to investigate the effect of CI processor mic location for speech recognition in semi-diffuse noise with speech originating from various source azimuths as encountered in everyday communicative environments. Research Design A repeated-measures, within-participant design was used to compare performance across listening conditions. Study Sample A total of 11 adults with Advanced Bionics CIs were recruited for this study. Data Collection and Analysis Physical acoustic output was measured on a Knowles Experimental Mannequin for Acoustic Research (KEMAR) for the T-Mic and BTE mic, with broadband noise presented at 0 and 90° (directed toward the implant processor). In addition to physical acoustic measurements, we also assessed recognition of sentences constructed by researchers at Texas Instruments, the Massachusetts Institute of Technology, and the Stanford Research Institute (TIMIT sentences) at 60 dBA for speech source azimuths of 0, 90, and 270°. Sentences were presented in a semi-diffuse restaurant noise originating from the R-SPACE 8-loudspeaker array. Signal-to-noise ratio was determined individually to achieve approximately 50% correct in the unilateral implanted listening condition with speech at 0°. Performance was compared across the T-Mic, 50/50, and the integrated BTE processor mic. Results The integrated BTE mic provided approximately 5 dB attenuation from 1500–4500 Hz for signals presented at 0° as compared with 90° (directed toward the processor). The T-Mic output was essentially equivalent for sources originating from 0 and 90°. Mic location also significantly affected sentence recognition as a function of source azimuth, with the T-Mic yielding the highest performance for speech originating from 0°. Conclusions These results have clinical implications for (1) future implant processor design with respect to mic location, (2) mic settings for implant recipients, and (3) execution of advanced speech testing in the clinic. PMID:25597460
Processor, Solid Propellant (chem.) 6-52.773--Technical Report on Standardization of the General Aptitude Test Battery.

ERIC Educational Resources Information Center

Manpower Administration (DOL), Washington, DC. U.S. Training and Employment Service.

The United States Training and Employment Service General Aptitude Test Battery (GATB), first published in 1947, has been included in a continuing program of research to validate the tests against success in many different occupations. The GATB consists of 12 tests which measure nine aptitudes: General Learning Ability; Verbal Aptitude; Numerical…
Automatic generation of Web mining environments

NASA Astrophysics Data System (ADS)

Cibelli, Maurizio; Costagliola, Gennaro

1999-02-01

The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
A microcontroller-based lock-in amplifier for sub-milliohm resistance measurements.

PubMed

Bengtsson, Lars E

2012-07-01

This paper presents a novel approach to the design of a digital ohmmeter with a resolution of <60 μΩ based on a general-purpose microcontroller and a high-impedance instrumentation amplifier only. The design uses two digital I/O-pins to alternate the current through the sample resistor and combined with a proper firmware routine, the design is a lock-in detector that discriminates any signal that is out of phase/frequency with the reference signal. This makes it possible to selectively detect the μV drop across sample resistors down to 55.6 μΩ using only the current that can be supplied by the digital output pins of a microcontroller. This is achieved without the need for an external reference signal generator and does not rely on the computing processing power of a digital signal processor.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

Solving the corner-turning problem for large interferometers

NASA Astrophysics Data System (ADS)

Lutomirski, Andrew; Tegmark, Max; Sanchez, Nevada J.; Stein, Leo C.; Urry, W. Lynn; Zaldarriaga, Matias

2011-01-01

The so-called corner-turning problem is a major bottleneck for radio telescopes with large numbers of antennas. The problem is essentially that of rapidly transposing a matrix that is too large to store on one single device; in radio interferometry, it occurs because data from each antenna need to be routed to an array of processors each of which will handle a limited portion of the data (say, a frequency range) but requires input from each antenna. We present a low-cost solution allowing the correlator to transpose its data in real time, without contending for bandwidth, via a butterfly network requiring neither additional RAM memory nor expensive general-purpose switching hardware. We discuss possible implementations of this using FPGA, CMOS, analog logic and optical technology, and conclude that the corner-turner cost can be small even for upcoming massive radio arrays.
Space Shuttle avionics upgrade - Issues and opportunities

NASA Astrophysics Data System (ADS)

Swaim, Richard A.; Wingert, William B.

An overview is conducted of existing Space Shuttle avionics and the possibilities for upgrading the cockpit to reduce costs and increase functionability. The current avionics include five general-purpose computers fitted with multifunction displays, dedicated switches and indicators, and dedicated flight instruments. The operational needs of the Shuttle are reviewed in the light of the avionics and potential upgrades in the form of microprocessors and display systems. The use of better processors can provide hardware support for multitasking and memory management and can reduce the life-cycle cost for software. Some limitations of the current technology are acknowledged including the Shuttle's power budget and structural configuration. A phased infusion of upgraded avionics is proposed that provides a functionally transparent replacement of crew-interface equipment as well as the addition of interface enhancements and the migration of selected functions.
RISC-type microprocessors may revolutionize aerospace simulation

NASA Astrophysics Data System (ADS)

Jackson, Albert S.

The author explores the application of RISC (reduced instruction set computer) processors in massively parallel computer (MPC) designs for aerospace simulation. The MPC approach is shown to be well adapted to the needs of aerospace simulation. It is shown that any of the three common types of interconnection schemes used with MPCs are effective for general-purpose simulation, although the bus-or switch-oriented machines are somewhat easier to use. For partial differential equation models, the hypercube approach at first glance appears more efficient because the nearest-neighbor connections required for three-dimensional models are hardwired in a hypercube machine. However, the data broadcast ability of a bus system, combined with the fact that data can be transmitted over a bus as soon as it has been updated, makes the bus approach very competitive with the hypercube approach even for these types of models.
MCNP4A: Features and philosophy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendricks, J.S.

This paper describes MCNP, states its philosophy, introduces a number of new features becoming available with version MCNP4A, and answers a number of questions asked by participants in the workshop. MCNP is a general-purpose three-dimensional neutron, photon and electron transport code. Its philosophy is ``Quality, Value and New Features.`` Quality is exemplified by new software quality assurance practices and a program of benchmarking against experiments. Value includes a strong emphasis on documentation and code portability. New features are the third priority. MCNP4A is now available at Los Alamos. New features in MCNP4A include enhanced statistical analysis, distributed processor multitasking, newmore » photon libraries, ENDF/B-VI capabilities, X-Windows graphics, dynamic memory allocation, expanded criticality output, periodic boundaries, plotting of particle tracks via SABRINA, and many other improvements. 23 refs.« less
Application of a system modification technique to dynamic tuning of a spinning rotor blade

NASA Technical Reports Server (NTRS)

Spain, C. V.

1987-01-01

An important consideration in the development of modern helicopters is the vibratory response of the main rotor blade. One way to minimize vibration levels is to ensure that natural frequencies of the spinning main rotor blade are well removed from integer multiples of the rotor speed. A technique for dynamically tuning a finite-element model of a rotor blade to accomplish that end is demonstrated. A brief overview is given of the general purpose finite element system known as Engineering Analysis Language (EAL) which was used in this work. A description of the EAL System Modification (SM) processor is then given along with an explanation of special algorithms developed to be used in conjunction with SM. Finally, this technique is demonstrated by dynamically tuning a model of an advanced composite rotor blade.
A multi-satellite orbit determination problem in a parallel processing environment

NASA Technical Reports Server (NTRS)

Deakyne, M. S.; Anderle, R. J.

1988-01-01

The Engineering Orbit Analysis Unit at GE Valley Forge used an Intel Hypercube Parallel Processor to investigate the performance and gain experience of parallel processors with a multi-satellite orbit determination problem. A general study was selected in which major blocks of computation for the multi-satellite orbit computations were used as units to be assigned to the various processors on the Hypercube. Problems encountered or successes achieved in addressing the orbit determination problem would be more likely to be transferable to other parallel processors. The prime objective was to study the algorithm to allow processing of observations later in time than those employed in the state update. Expertise in ephemeris determination was exploited in addressing these problems and the facility used to bring a realism to the study which would highlight the problems which may not otherwise be anticipated. Secondary objectives were to gain experience of a non-trivial problem in a parallel processor environment, to explore the necessary interplay of serial and parallel sections of the algorithm in terms of timing studies, to explore the granularity (coarse vs. fine grain) to discover the granularity limit above which there would be a risk of starvation where the majority of nodes would be idle or under the limit where the overhead associated with splitting the problem may require more work and communication time than is useful.
75 FR 9087 - Trade Adjustment Assistance for Farmers

Federal Register 2010, 2011, 2012, 2013, 2014

2010-03-01

... procedures by which producers of raw agricultural commodities can petition for certification, apply for... processors are eligible for program benefits. The purpose of TAA for Farmers is to assist producers of raw... specifically limits program benefits to producers of raw agricultural commodities. Length of Intensive Training...
17 CFR 31.4 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-04-01

... Definitions. For the purposes of this part: (a)-(b) [Reserved] (c) Promotional material includes: (1) Any text... books and records of an individual, a partnership, corporation or other type association (1) for one of...) Commercial leverage account means an account of a commercial enterprise, such as a producer, processor...
21 CFR 123.9 - Records.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 21 Food and Drugs 2 2013-04-01 2013-04-01 false Records. 123.9 Section 123.9 Food and Drugs FOOD... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.9 Records. (a) General requirements. All records required by this part shall include: (1) The name and location of the processor or importer; (2...
21 CFR 123.9 - Records.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 21 Food and Drugs 2 2011-04-01 2011-04-01 false Records. 123.9 Section 123.9 Food and Drugs FOOD... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.9 Records. (a) General requirements. All records required by this part shall include: (1) The name and location of the processor or importer; (2...
21 CFR 123.9 - Records.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 21 Food and Drugs 2 2012-04-01 2012-04-01 false Records. 123.9 Section 123.9 Food and Drugs FOOD... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.9 Records. (a) General requirements. All records required by this part shall include: (1) The name and location of the processor or importer; (2...
21 CFR 123.9 - Records.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 21 Food and Drugs 2 2014-04-01 2014-04-01 false Records. 123.9 Section 123.9 Food and Drugs FOOD... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.9 Records. (a) General requirements. All records required by this part shall include: (1) The name and location of the processor or importer; (2...
21 CFR 123.9 - Records.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 21 Food and Drugs 2 2010-04-01 2010-04-01 false Records. 123.9 Section 123.9 Food and Drugs FOOD... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.9 Records. (a) General requirements. All records required by this part shall include: (1) The name and location of the processor or importer; (2...
27 CFR 19.5 - Manufacturing products unfit for beverage use.

Code of Federal Regulations, 2011 CFR

2011-04-01

... qualify as a distilled spirits plant (processor): (1) Medicines, medicinal preparations, food products... beverage purposes; (3) Toilet, medicinal, and antiseptic preparations and solutions that are unfit for use... unfit for beverage use. 19.5 Section 19.5 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX...
General RMP Guidance - Appendix E: Supplemental Risk Management Program Guidance for Ammonia Refrigeration Facilities

EPA Pesticide Factsheets

Additional information for food processors, food distributors, refrigerated warehouses, and any other facility with ammonia refrigeration system. Includes guidance on exemptions, threshold quantity, offsite consequence analysis.
27 CFR 19.151 - General requirements for registration.

Code of Federal Regulations, 2010 CFR

2010-04-01

... law, operations as a distiller, warehouseman, or processor may be conducted only on the bonded... conduct at such plant operations as a distiller, as a warehouseman, or as both. (c) Registration. Each...
Call Admission Control on Single Node Networks under Output Rate-Controlled Generalized Processor Sharing (ORC-GPS) Scheduler

NASA Astrophysics Data System (ADS)

Hanada, Masaki; Nakazato, Hidenori; Watanabe, Hitoshi

Multimedia applications such as music or video streaming, video teleconferencing and IP telephony are flourishing in packet-switched networks. Applications that generate such real-time data can have very diverse quality-of-service (QoS) requirements. In order to guarantee diverse QoS requirements, the combined use of a packet scheduling algorithm based on Generalized Processor Sharing (GPS) and leaky bucket traffic regulator is the most successful QoS mechanism. GPS can provide a minimum guaranteed service rate for each session and tight delay bounds for leaky bucket constrained sessions. However, the delay bounds for leaky bucket constrained sessions under GPS are unnecessarily large because each session is served according to its associated constant weight until the session buffer is empty. In order to solve this problem, a scheduling policy called Output Rate-Controlled Generalized Processor Sharing (ORC-GPS) was proposed in [17]. ORC-GPS is a rate-based scheduling like GPS, and controls the service rate in order to lower the delay bounds for leaky bucket constrained sessions. In this paper, we propose a call admission control (CAC) algorithm for ORC-GPS, for leaky-bucket constrained sessions with deterministic delay requirements. This CAC algorithm for ORC-GPS determines the optimal values of parameters of ORC-GPS from the deterministic delay requirements of the sessions. In numerical experiments, we compare the CAC algorithm for ORC-GPS with one for GPS in terms of schedulable region and computational complexity.
Development of a 30-cm ion thruster thermal-vacuum power processor

NASA Technical Reports Server (NTRS)

Herron, B. G.

1976-01-01

The 30-cm Hg electron-bombardment ion thruster presently under development has reached engineering model status and is generally accepted as the prime propulsion thruster module to be used on the earliest solar electric propulsion missions. This paper presents the results of a related program to develop a transistorized 3-kW Thermal-Vacuum Breadboard (TVBB) Power Processor for this thruster. Emphasized in the paper are the implemented electrical and mechanical designs as well as the resultant system performance achieved over a range of test conditions. In addition, design modifications affording improved performance are identified and discussed.
Negative base encoding in optical linear algebra processors

NASA Technical Reports Server (NTRS)

Perlee, C.; Casasent, D.

1986-01-01

In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
Geospace simulations using modern accelerator processor technology

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D. J.

2009-12-01

OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.

Noise reduction technologies implemented in head-worn preprocessors for improving cochlear implant performance in reverberant noise fields.

PubMed

Chung, King; Nelson, Lance; Teske, Melissa

2012-09-01

The purpose of this study was to investigate whether a multichannel adaptive directional microphone and a modulation-based noise reduction algorithm could enhance cochlear implant performance in reverberant noise fields. A hearing aid was modified to output electrical signals (ePreprocessor) and a cochlear implant speech processor was modified to receive electrical signals (eProcessor). The ePreprocessor was programmed to flat frequency response and linear amplification. Cochlear implant listeners wore the ePreprocessor-eProcessor system in three reverberant noise fields: 1) one noise source with variable locations; 2) three noise sources with variable locations; and 3) eight evenly spaced noise sources from 0° to 360°. Listeners' speech recognition scores were tested when the ePreprocessor was programmed to omnidirectional microphone (OMNI), omnidirectional microphone plus noise reduction algorithm (OMNI + NR), and adaptive directional microphone plus noise reduction algorithm (ADM + NR). They were also tested with their own cochlear implant speech processor (CI_OMNI) in the three noise fields. Additionally, listeners rated overall sound quality preferences on recordings made in the noise fields. Results indicated that ADM+NR produced the highest speech recognition scores and the most preferable rating in all noise fields. Factors requiring attention in the hearing aid-cochlear implant integration process are discussed. Copyright © 2012 Elsevier B.V. All rights reserved.
Distributed processor allocation for launching applications in a massively connected processors complex

DOEpatents

Pedretti, Kevin

2008-11-18

A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.
Multiphase complete exchange on Paragon, SP2 and CS-2

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.

1995-01-01

The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. On hypercubes, multiphase complete exchange has been developed and shown to provide optimal performance over varying message sizes. Most commercial multicomputer systems do not have a hypercube interconnect. However, they use special purpose hardware and dedicated communication processors to achieve very high performance communication and can be made to emulate the hypercube quite well. Multiphase complete exchange has been implemented on three contemporary parallel architectures: the Intel Paragon, IBM SP2 and Meiko CS-2. The essential features of these machines are described and their basic interprocessor communication overheads are discussed. The performance of multiphase complete exchange is evaluated on each machine. It is shown that the theoretical ideas developed for hypercubes are also applicable in practice to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions.
SETI prototype system for NASA's Sky Survey microwave observing project - A progress report

NASA Technical Reports Server (NTRS)

Klein, M. J.; Gulkis, S.; Wilck, H. C.

1990-01-01

Two complementary search strategies, a Targeted Search and a Sky Survey, are part of NASA's SETI microwave observing project scheduled to begin in October of 1992. The current progress in the development of hardware and software elements of the JPL Sky Survey data processing system are presented. While the Targeted Search stresses sensitivity allowing the detection of either continuous or pulsed signals over the 1-3 GHz frequency range, the Sky Survey gives up sensitivity to survey the 99 percent of the sky that is not covered by the Targeted Search. The Sky Survey spans a larger frequency range from 1-10 GHz. The two searches will deploy special-purpose digital signal processing equipment designed and built to automate the observing and data processing activities. A two-million channel digital wideband spectrum analyzer and a signal processor system will serve as a prototype for the SETI Sky Survey processor. The design will permit future expansion to meet the SETI requirement that the processor concurrently search for left and right circularly polarized signals.
High-speed real-time animated displays on the ADAGE (trademark) RDS 3000 raster graphics system

NASA Technical Reports Server (NTRS)

Kahlbaum, William M., Jr.; Ownbey, Katrina L.

1989-01-01

Techniques which may be used to increase the animation update rate of real-time computer raster graphic displays are discussed. They were developed on the ADAGE RDS 3000 graphic system in support of the Advanced Concepts Simulator at the NASA Langley Research Center. These techniques involve the use of a special purpose parallel processor, for high-speed character generation. The description of the parallel processor includes the Barrel Shifter which is part of the hardware and is the key to the high-speed character rendition. The final result of this total effort was a fourfold increase in the update rate of an existing primary flight display from 4 to 16 frames per second.
Techniques for the rapid display and manipulation of 3-D biomedical data.

PubMed

Goldwasser, S M; Reynolds, R A; Talton, D A; Walsh, E S

1988-01-01

The use of fully interactive 3-D workstations with true real-time performance will become increasingly common as technology matures and economical commercial systems become available. This paper provides a comprehensive introduction to high speed approaches to the display and manipulation of 3-D medical objects obtained from tomographic data acquisition systems such as CT, MR, and PET. A variety of techniques are outlined including the use of software on conventional minicomputers, hardware assist devices such as array processors and programmable frame buffers, and special purpose computer architecture for dedicated high performance systems. While both algorithms and architectures are addressed, the major theme centers around the utilization of hardware-based approaches including parallel processors for the implementation of true real-time systems.
Managing seafood processing wastewater on the Oregon coast: A time of transition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, M.D.; Miner, J.R.

1997-12-01

Seafood processors along the Oregon coast practice a wastewater management plan that is unique within the state. Most of these operations discharge wastewater under a General Permit issued by the Oregon Department of Environmental Quality (DEQ) that requires only that they screen the wastewater to remove particles that will not pass through a 40 mesh screen. The General Permit was issued in February of 1992 and was scheduled to expire at the end of December, 1996. It has been extended until a replacement is adopted. Alternatives are currently under consideration by the DEQ. A second issue is the increasing competitionmore » for water within the coastal communities that are experiencing a growing tourist industry and a static water supply. Tourism and seafood processing both have their peak water demands during the summer months when fresh water supplies are most limited. Disposal of solid wastes has been simplified for many of the processors along the Lower Columbia River by a Fisheries Enhancement Program which allows processors to grind the solid waste then to discharge it into the stream under appropriate tidal conditions. There is no data which indicates water quality damage from this practice nor is there clear evidence of enhanced fishery productivity.« less
Heterogeneous real-time computing in radio astronomy

NASA Astrophysics Data System (ADS)

Ford, John M.; Demorest, Paul; Ransom, Scott

2010-07-01

Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
Monolithic ceramic analysis using the SCARE program

NASA Technical Reports Server (NTRS)

Manderscheid, Jane M.

1988-01-01

The Structural Ceramics Analysis and Reliability Evaluation (SCARE) computer program calculates the fast fracture reliability of monolithic ceramic components. The code is a post-processor to the MSC/NASTRAN general purpose finite element program. The SCARE program automatically accepts the MSC/NASTRAN output necessary to compute reliability. This includes element stresses, temperatures, volumes, and areas. The SCARE program computes two-parameter Weibull strength distributions from input fracture data for both volume and surface flaws. The distributions can then be used to calculate the reliability of geometrically complex components subjected to multiaxial stress states. Several fracture criteria and flaw types are available for selection by the user, including out-of-plane crack extension theories. The theoretical basis for the reliability calculations was proposed by Batdorf. These models combine linear elastic fracture mechanics (LEFM) with Weibull statistics to provide a mechanistic failure criterion. Other fracture theories included in SCARE are the normal stress averaging technique and the principle of independent action. The objective of this presentation is to summarize these theories, including their limitations and advantages, and to provide a general description of the SCARE program, along with example problems.
Atmosphere, Magnetosphere and Plasmas in Space (AMPS). Spacelab payload definition study. Volume 7, book 3: Supporting Research and Technology (SR and T) report

NASA Technical Reports Server (NTRS)

1976-01-01

The items identified as required to support the AMPS mission and requiring SR and T support and further work are: (1) a general purpose Experiment Pointing Mount; (2) a technique for measuring the attitude of the pallet-mounted or deployed experiments; (3) the development of a common optics cryogenically cooled interferometer spectrometer; (4) the development of a differential absorption lidar system for the measurement of ozone densitites in the earth's atmosphere; (5) the development of dc to dc power processors which are capable of converting energy stored in a capacitor system at 500 V to energy supplied to equipment operating at 40 kV and at 20 kW (eventually up to 100 kW); and (6) the development of a magnetic or possibly electrostatic deflection system capable of bending the beam of an electron accelerator. A data sheet is included for each item, briefly describing the background and need for each item, and the general objectives of the required development, and identifying the schedule requirements in support of the AMPS program.
Laser velocimetry: A state-of-the-art overview

NASA Technical Reports Server (NTRS)

Stevenson, W. H.

1982-01-01

General systems design and optical and signal processing requirements for laser velocimetric measurement of flows are reviewed. Bias errors which occur in measurements using burst (counter) processors are discussed and particle seeding requirements are suggested.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
A Spacecraft Housekeeping System-on-Chip in a Radiation Hardened Structured ASIC

NASA Technical Reports Server (NTRS)

Suarez, George; DuMonthier, Jeffrey J.; Sheikh, Salman S.; Powell, Wesley A.; King, Robyn L.

2012-01-01

Housekeeping systems are essential to health monitoring of spacecraft and instruments. Typically, sensors are distributed across various sub-systems and data is collected using components such as analog-to-digital converters, analog multiplexers and amplifiers. In most cases programmable devices are used to implement the data acquisition control and storage, and the interface to higher level systems. Such discrete implementations require additional size, weight, power and interconnect complexity versus an integrated circuit solution, as well as the qualification of multiple parts. Although commercial devices are readily available, they are not suitable for space applications due the radiation tolerance and qualification requirements. The Housekeeping System-o n-A-Chip (HKSOC) is a low power, radiation hardened integrated solution suitable for spacecraft and instrument control and data collection. A prototype has been designed and includes a wide variety of functions including a 16-channel analog front-end for driving and reading sensors, analog-to-digital and digital-to-analog converters, on-chip temperature sensor, power supply current sense circuits, general purpose comparators and amplifiers, a 32-bit processor, digital I/O, pulse-width modulation (PWM) generators, timers and I2C master and slave serial interfaces. In addition, the device can operate in a bypass mode where the processor is disabled and external logic is used to control the analog and mixed signal functions. The device is suitable for stand-alone or distributed systems where multiple chips can be deployed across different sub-systems as intelligent nodes with computing and processing capabilities.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

PubMed Central

Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

2014-01-01

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Long sequence correlation coprocessor

NASA Astrophysics Data System (ADS)

Gage, Douglas W.

1994-09-01

A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.
Wireless augmented reality communication system

NASA Technical Reports Server (NTRS)

Devereaux, Ann (Inventor); Agan, Martin (Inventor); Jedrey, Thomas (Inventor)

2006-01-01

The system of the present invention is a highly integrated radio communication system with a multimedia co-processor which allows true two-way multimedia (video, audio, data) access as well as real-time biomedical monitoring in a pager-sized portable access unit. The system is integrated in a network structure including one or more general purpose nodes for providing a wireless-to-wired interface. The network architecture allows video, audio and data (including biomedical data) streams to be connected directly to external users and devices. The portable access units may also be mated to various non-personal devices such as cameras or environmental sensors for providing a method for setting up wireless sensor nets from which reported data may be accessed through the portable access unit. The reported data may alternatively be automatically logged at a remote computer for access and viewing through a portable access unit, including the user's own.
A fully reconfigurable waveguide Bragg grating for programmable photonic signal processing.

PubMed

Zhang, Weifeng; Yao, Jianping

2018-04-11

Since the discovery of the Bragg's law in 1913, Bragg gratings have become important optical devices and have been extensively used in various systems. In particular, the successful inscription of a Bragg grating in a fiber core has significantly boosted its engineering applications. However, a conventional grating device is usually designed for a particular use, which limits general-purpose applications since its index modulation profile is fixed after fabrication. In this article, we propose to implement a fully reconfigurable grating, which is fast and electrically reconfigurable by field programming. The concept is verified by fabricating an integrated grating on a silicon-on-insulator platform, which is employed as a programmable signal processor to perform multiple signal processing functions including temporal differentiation, microwave time delay, and frequency identification. The availability of ultrafast and reconfigurable gratings opens new avenues for programmable optical signal processing at the speed of light.
Wireless Augmented Reality Communication System

NASA Technical Reports Server (NTRS)

Jedrey, Thomas (Inventor); Agan, Martin (Inventor); Devereaux, Ann (Inventor)

2014-01-01

The system of the present invention is a highly integrated radio communication system with a multimedia co-processor which allows true two-way multimedia (video, audio, data) access as well as real-time biomedical monitoring in a pager-sized portable access unit. The system is integrated in a network structure including one or more general purpose nodes for providing a wireless-to-wired interface. The network architecture allows video, audio and data (including biomedical data) streams to be connected directly to external users and devices. The portable access units may also be mated to various non-personal devices such as cameras or environmental sensors for providing a method for setting up wireless sensor nets from which reported data may be accessed through the portable access unit. The reported data may alternatively be automatically logged at a remote computer for access and viewing through a portable access unit, including the user's own.
Wireless Augmented Reality Communication System

NASA Technical Reports Server (NTRS)

Agan, Martin (Inventor); Devereaux, Ann (Inventor); Jedrey, Thomas (Inventor)

2016-01-01

The system of the present invention is a highly integrated radio communication system with a multimedia co-processor which allows true two-way multimedia (video, audio, data) access as well as real-time biomedical monitoring in a pager-sized portable access unit. The system is integrated in a network structure including one or more general purpose nodes for providing a wireless-to-wired interface. The network architecture allows video, audio and data (including biomedical data) streams to be connected directly to external users and devices. The portable access units may also be mated to various non-personal devices such as cameras or environmental sensors for providing a method for setting up wireless sensor nets from which reported data may be accessed through the portable access unit. The reported data may alternatively be automatically logged at a remote computer for access and viewing through a portable access unit, including the user's own.
Implementing real-time robotic systems using CHIMERA II

NASA Technical Reports Server (NTRS)

Stewart, David B.; Schmitz, Donald E.; Khosla, Pradeep K.

1990-01-01

A description is given of the CHIMERA II programming environment and operating system, which was developed for implementing real-time robotic systems. Sensor-based robotic systems contain both general- and special-purpose hardware, and thus the development of applications tends to be a time-consuming task. The CHIMERA II environment is designed to reduce the development time by providing a convenient software interface between the hardware and the user. CHIMERA II supports flexible hardware configurations which are based on one or more VME-backplanes. All communication across multiple processors is transparent to the user through an extensive set of interprocessor communication primitives. CHIMERA II also provides a high-performance real-time kernel which supports both deadline and highest-priority-first scheduling. The flexibility of CHIMERA II allows hierarchical models for robot control, such as NASREM, to be implemented with minimal programming time and effort.

A Conceptual Design for a Reliable Optical Bus (ROBUS)

NASA Technical Reports Server (NTRS)

Miner, Paul S.; Malekpour, Mahyar; Torres, Wilfredo

2002-01-01

The Scalable Processor-Independent Design for Electromagnetic Resilience (SPIDER) is a new family of fault-tolerant architectures under development at NASA Langley Research Center (LaRC). The SPIDER is a general-purpose computational platform suitable for use in ultra-reliable embedded control applications. The design scales from a small configuration supporting a single aircraft function to a large distributed configuration capable of supporting several functions simultaneously. SPIDER consists of a collection of simplex processing elements communicating via a Reliable Optical Bus (ROBUS). The ROBUS is an ultra-reliable, time-division multiple access broadcast bus with strictly enforced write access (no babbling idiots) providing basic fault-tolerant services using formally verified fault-tolerance protocols including Interactive Consistency (Byzantine Agreement), Internal Clock Synchronization, and Distributed Diagnosis. The conceptual design of the ROBUS is presented in this paper including requirements, topology, protocols, and the block-level design. Verification activities, including the use of formal methods, are also discussed.
Raster Scan Computer Image Generation (CIG) System Based On Refresh Memory

NASA Astrophysics Data System (ADS)

Dichter, W.; Doris, K.; Conkling, C.

1982-06-01

A full color, Computer Image Generation (CIG) raster visual system has been developed which provides a high level of training sophistication by utilizing advanced semiconductor technology and innovative hardware and firmware techniques. Double buffered refresh memory and efficient algorithms eliminate the problem of conventional raster line ordering by allowing the generated image to be stored in a random fashion. Modular design techniques and simplified architecture provide significant advantages in reduced system cost, standardization of parts, and high reliability. The major system components are a general purpose computer to perform interfacing and data base functions; a geometric processor to define the instantaneous scene image; a display generator to convert the image to a video signal; an illumination control unit which provides final image processing; and a CRT monitor for display of the completed image. Additional optional enhancements include texture generators, increased edge and occultation capability, curved surface shading, and data base extensions.
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

1999-01-01

The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
On-orbit experience with the HEAO attitude control subsystem

NASA Technical Reports Server (NTRS)

Hoffman, D. P.; Berkery, E. A.

1978-01-01

The first satellite (HEAO-1) in the High Energy Astronomy Observatory Program series was launched successfully on Aug. 12, 1977. To date it has completed over nine months of orbital operation in a science data gathering mode. During this period all attitude control modes have been exercised and all primary mission objectives have been achieved. This paper highlights the characteristics of the attitude control subsystem design and compares the predicted performance with the actual flight operations experience. Environmental disturbance modeling, component hardware/software characteristics, and overall attitude control performance are reviewed and are found to compare very well with the prelaunch analytical predictions. Brief comments are also included regarding the operations aspects of the attitude control subsystem. The experience in this regard demonstrates the effectiveness of the design flexibility afforded by the presence of a general purpose digital processor in the subsystem flight hardware implementation.
Parallel Processing of Adaptive Meshes with Load Balancing

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. I. DESCRIPTION OF THE PHYSICS AND THE NUMERICAL METHODS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wetzstein, M.; Nelson, Andrew F.; Naab, T.

2009-10-01

We present a numerical code for simulating the evolution of astrophysical systems using particles to represent the underlying fluid flow. The code is written in Fortran 95 and is designed to be versatile, flexible, and extensible, with modular options that can be selected either at the time the code is compiled or at run time through a text input file. We include a number of general purpose modules describing a variety of physical processes commonly required in the astrophysical community and we expect that the effort required to integrate additional or alternate modules into the code will be small. Inmore » its simplest form the code can evolve the dynamical trajectories of a set of particles in two or three dimensions using a module which implements either a Leapfrog or Runge-Kutta-Fehlberg integrator, selected by the user at compile time. The user may choose to allow the integrator to evolve the system using individual time steps for each particle or with a single, global time step for all. Particles may interact gravitationally as N-body particles, and all or any subset may also interact hydrodynamically, using the smoothed particle hydrodynamic (SPH) method by selecting the SPH module. A third particle species can be included with a module to model massive point particles which may accrete nearby SPH or N-body particles. Such particles may be used to model, e.g., stars in a molecular cloud. Free boundary conditions are implemented by default, and a module may be selected to include periodic boundary conditions. We use a binary 'Press' tree to organize particles for rapid access in gravity and SPH calculations. Modules implementing an interface with special purpose 'GRAPE' hardware may also be selected to accelerate the gravity calculations. If available, forces obtained from the GRAPE coprocessors may be transparently substituted for those obtained from the tree, or both tree and GRAPE may be used as a combination GRAPE/tree code. The code may be run without modification on single processors or in parallel using OpenMP compiler directives on large-scale, shared memory parallel machines. We present simulations of several test problems, including a merger simulation of two elliptical galaxies with 800,000 particles. In comparison to the Gadget-2 code of Springel, the gravitational force calculation, which is the most costly part of any simulation including self-gravity, is {approx}4.6-4.9 times faster with VINE when tested on different snapshots of the elliptical galaxy merger simulation when run on an Itanium 2 processor in an SGI Altix. A full simulation of the same setup with eight processors is a factor of 2.91 faster with VINE. The code is available to the public under the terms of the Gnu General Public License.« less
Vine—A Numerical Code for Simulating Astrophysical Systems Using Particles. I. Description of the Physics and the Numerical Methods

NASA Astrophysics Data System (ADS)

Wetzstein, M.; Nelson, Andrew F.; Naab, T.; Burkert, A.

2009-10-01

We present a numerical code for simulating the evolution of astrophysical systems using particles to represent the underlying fluid flow. The code is written in Fortran 95 and is designed to be versatile, flexible, and extensible, with modular options that can be selected either at the time the code is compiled or at run time through a text input file. We include a number of general purpose modules describing a variety of physical processes commonly required in the astrophysical community and we expect that the effort required to integrate additional or alternate modules into the code will be small. In its simplest form the code can evolve the dynamical trajectories of a set of particles in two or three dimensions using a module which implements either a Leapfrog or Runge-Kutta-Fehlberg integrator, selected by the user at compile time. The user may choose to allow the integrator to evolve the system using individual time steps for each particle or with a single, global time step for all. Particles may interact gravitationally as N-body particles, and all or any subset may also interact hydrodynamically, using the smoothed particle hydrodynamic (SPH) method by selecting the SPH module. A third particle species can be included with a module to model massive point particles which may accrete nearby SPH or N-body particles. Such particles may be used to model, e.g., stars in a molecular cloud. Free boundary conditions are implemented by default, and a module may be selected to include periodic boundary conditions. We use a binary "Press" tree to organize particles for rapid access in gravity and SPH calculations. Modules implementing an interface with special purpose "GRAPE" hardware may also be selected to accelerate the gravity calculations. If available, forces obtained from the GRAPE coprocessors may be transparently substituted for those obtained from the tree, or both tree and GRAPE may be used as a combination GRAPE/tree code. The code may be run without modification on single processors or in parallel using OpenMP compiler directives on large-scale, shared memory parallel machines. We present simulations of several test problems, including a merger simulation of two elliptical galaxies with 800,000 particles. In comparison to the Gadget-2 code of Springel, the gravitational force calculation, which is the most costly part of any simulation including self-gravity, is ~4.6-4.9 times faster with VINE when tested on different snapshots of the elliptical galaxy merger simulation when run on an Itanium 2 processor in an SGI Altix. A full simulation of the same setup with eight processors is a factor of 2.91 faster with VINE. The code is available to the public under the terms of the Gnu General Public License.
Video signal processing system uses gated current mode switches to perform high speed multiplication and digital-to-analog conversion

NASA Technical Reports Server (NTRS)

Gilliland, M. G.; Rougelot, R. S.; Schumaker, R. A.

1966-01-01

Video signal processor uses special-purpose integrated circuits with nonsaturating current mode switching to accept texture and color information from a digital computer in a visual spaceflight simulator and to combine these, for display on color CRT with analog information concerning fading.
Evaluation of Algorithms for Compressing Hyperspectral Data

NASA Technical Reports Server (NTRS)

Cook, Sid; Harsanyi, Joseph; Faber, Vance

2003-01-01

With EO-1 Hyperion in orbit NASA is showing their continued commitment to hyperspectral imaging (HSI). As HSI sensor technology continues to mature, the ever-increasing amounts of sensor data generated will result in a need for more cost effective communication and data handling systems. Lockheed Martin, with considerable experience in spacecraft design and developing special purpose onboard processors, has teamed with Applied Signal & Image Technology (ASIT), who has an extensive heritage in HSI spectral compression and Mapping Science (MSI) for JPEG 2000 spatial compression expertise, to develop a real-time and intelligent onboard processing (OBP) system to reduce HSI sensor downlink requirements. Our goal is to reduce the downlink requirement by a factor > 100, while retaining the necessary spectral and spatial fidelity of the sensor data needed to satisfy the many science, military, and intelligence goals of these systems. Our compression algorithms leverage commercial-off-the-shelf (COTS) spectral and spatial exploitation algorithms. We are currently in the process of evaluating these compression algorithms using statistical analysis and NASA scientists. We are also developing special purpose processors for executing these algorithms onboard a spacecraft.
An Early Quantum Computing Proposal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Stephen Russell; Alexander, Francis Joseph; Barros, Kipton Marcos

The D-Wave 2X is the third generation of quantum processing created by D-Wave. NASA (with Google and USRA) and Lockheed Martin (with USC), both own D-Wave systems. Los Alamos National Laboratory (LANL) purchased a D-Wave 2X in November 2015. The D-Wave 2X processor contains (nominally) 1152 quantum bits (or qubits) and is designed to specifically perform quantum annealing, which is a well-known method for finding a global minimum of an optimization problem. This methodology is based on direct execution of a quantum evolution in experimental quantum hardware. While this can be a powerful method for solving particular kinds of problems,more » it also means that the D-Wave 2X processor is not a general computing processor and cannot be programmed to perform a wide variety of tasks. It is a highly specialized processor, well beyond what NNSA currently thinks of as an “advanced architecture.”A D-Wave is best described as a quantum optimizer. That is, it uses quantum superposition to find the lowest energy state of a system by repeated doses of power and settling stages. The D-Wave produces multiple solutions to any suitably formulated problem, one of which is the lowest energy state solution (global minimum). Mapping problems onto the D-Wave requires defining an objective function to be minimized and then encoding that function in the Hamiltonian of the D-Wave system. The quantum annealing method is then used to find the lowest energy configuration of the Hamiltonian using the current D-Wave Two, two-level, quantum processor. This is not always an easy thing to do, and the D-Wave Two has significant limitations that restrict problem sizes that can be run and algorithmic choices that can be made. Furthermore, as more people are exploring this technology, it has become clear that it is very difficult to come up with general approaches to optimization that can both utilize the D-Wave and that can do better than highly developed algorithms on conventional computers for specific applications. These are all fundamental challenges that must be overcome for the D-Wave, or similar, quantum computing technology to be broadly applicable.« less
Bluetooth telemedicine processor for multichannel biomedical signal transmission via mobile cellular networks.

PubMed

Rasid, Mohd Fadlee A; Woodward, Bryan

2005-03-01

One of the emerging issues in m-Health is how best to exploit the mobile communications technologies that are now almost globally available. The challenge is to produce a system to transmit a patient's biomedical signals directly to a hospital for monitoring or diagnosis, using an unmodified mobile telephone. The paper focuses on the design of a processor, which samples signals from sensors on the patient. It then transmits digital data over a Bluetooth link to a mobile telephone that uses the General Packet Radio Service. The modular design adopted is intended to provide a "future-proofed" system, whose functionality may be upgraded by modifying the software.
Design and implementation of a medium speed communications interface and protocol for a low cost, refreshed display computer

NASA Technical Reports Server (NTRS)

Phyne, J. R.; Nelson, M. D.

1975-01-01

The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.
Certified organic vegetable production for market

USDA-ARS?s Scientific Manuscript database

Federal guidelines for organic certification in 2002 provided structure for producers and processors to market certified organic foods. The guidelines provide general provisions and processes for obtaining and maintaining organic certification, but did not specify best management practices for crop...
The design of infrared information collection circuit based on embedded technology

NASA Astrophysics Data System (ADS)

Liu, Haoting; Zhang, Yicong

2013-07-01

S3C2410 processor is a 16/32 bit RISC embedded processor which based on ARM920T core and AMNA bus, and mainly for handheld devices, and high cost, low-power applications. This design introduces a design plan of the PIR sensor system, circuit and its assembling, debugging. The Application Circuit of the passive PIR alarm uses the invisibility of the infrared radiation well into the alarm system, and in order to achieve the anti-theft alarm and security purposes. When the body goes into the range of PIR sensor detection, sensors will detect heat sources and then the sensor will output a weak signal. The Signal should be amplified, compared and delayed; finally light emitting diodes emit light, playing the role of a police alarm.
Data systems and computer science programs: Overview

NASA Technical Reports Server (NTRS)

Smith, Paul H.; Hunter, Paul

1991-01-01

An external review of the Integrated Technology Plan for the Civil Space Program is presented. The topics are presented in viewgraph form and include the following: onboard memory and storage technology; advanced flight computers; special purpose flight processors; onboard networking and testbeds; information archive, access, and retrieval; visualization; neural networks; software engineering; and flight control and operations.
Measuring Sound-Processor Threshold Levels for Pediatric Cochlear Implant Recipients Using Conditioned Play Audiometry via Telepractice

ERIC Educational Resources Information Center

Goehring, Jenny L.; Hughes, Michelle L.

2017-01-01

Purpose: This study evaluated the use of telepractice for measuring cochlear implant (CI) behavioral threshold (T) levels in children using conditioned play audiometry (CPA). The goals were to determine whether (a) T levels measured via telepractice were not significantly different from those obtained in person, (b) response probability differed…
Multiple degree of freedom optical pattern recognition

NASA Technical Reports Server (NTRS)

Casasent, D.

1987-01-01

Three general optical approaches to multiple degree of freedom object pattern recognition (where no stable object rest position exists) are advanced. These techniques include: feature extraction, correlation, and artificial intelligence. The details of the various processors are advanced together with initial results.
Corrosion Prediction with Parallel Finite Element Modeling for Coupled Hygro-Chemo Transport into Concrete under Chloride-Rich Environment

PubMed Central

Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping

2017-01-01

The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
The CMS Level-1 Calorimeter Trigger for LHC Run II

NASA Astrophysics Data System (ADS)

Sinthuprasith, Tutanon

2017-01-01

The phase-1 upgrades of the CMS Level-1 calorimeter trigger have been completed. The Level-1 trigger has been fully commissioned and it will be used by CMS to collect data starting from the 2016 data run. The new trigger has been designed to improve the performance at high luminosity and large number of simultaneous inelastic collisions per crossing (pile-up). For this purpose it uses a novel design, the Time Multiplexed Design, which enables the data from an event to be processed by a single trigger processor at full granularity over several bunch crossings. The TMT design is a modular design based on the uTCA standard. The architecture is flexible and the number of trigger processors can be expanded according to the physics needs of CMS. Intelligent, more complex, and innovative algorithms are now the core of the first decision layer of CMS: the upgraded trigger system implements pattern recognition and MVA (Boosted Decision Tree) regression techniques in the trigger processors for pT assignment, pile up subtraction, and isolation requirements for electrons, and taus. The performance of the TMT design and the latency measurements and the algorithm performance which has been measured using data is also presented here.
Efficiency of static core turn-off in a system-on-a-chip with variation

DOEpatents

Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong

2013-10-29

A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.

Custom large scale integrated circuits for spaceborne SAR processors

NASA Technical Reports Server (NTRS)

Tyree, V. C.

1978-01-01

The application of modern LSI technology to the development of a time-domain azimuth correlator for SAR processing is discussed. General design requirements for azimuth correlators for missions such as SEASAT-A, Venus orbital imaging radar (VOIR), and shuttle imaging radar (SIR) are summarized. Several azimuth correlator architectures that are suitable for implementation using custom LSI devices are described. Technical factors pertaining to selection of appropriate LSI technologies are discussed, and the maturity of alternative technologies for spacecraft applications are reported in the context of expected space mission launch dates. The preliminary design of a custom LSI time-domain azimuth correlator device (ACD) being developed for use in future SAR processors is detailed.
Calibrating thermal behavior of electronics

DOEpatents

Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

2017-07-11

A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Calibrating thermal behavior of electronics

DOEpatents

Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

2016-05-31

A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Calibrating thermal behavior of electronics

DOEpatents

Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

2017-01-03

A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Tomographic image reconstruction using the cell broadband engine (CBE) general purpose hardware

NASA Astrophysics Data System (ADS)

Knaup, Michael; Steckmann, Sven; Bockenbach, Olivier; Kachelrieß, Marc

2007-02-01

Tomographic image reconstruction, such as the reconstruction of CT projection values, of tomosynthesis data, PET or SPECT events, is computational very demanding. In filtered backprojection as well as in iterative reconstruction schemes, the most time-consuming steps are forward- and backprojection which are often limited by the memory bandwidth. Recently, a novel general purpose architecture optimized for distributed computing became available: the Cell Broadband Engine (CBE). Its eight synergistic processing elements (SPEs) currently allow for a theoretical performance of 192 GFlops (3 GHz, 8 units, 4 floats per vector, 2 instructions, multiply and add, per clock). To maximize image reconstruction speed we modified our parallel-beam and perspective backprojection algorithms which are highly optimized for standard PCs, and optimized the code for the CBE processor. 1-3 In addition, we implemented an optimized perspective forwardprojection on the CBE which allows us to perform statistical image reconstructions like the ordered subset convex (OSC) algorithm. 4 Performance was measured using simulated data with 512 projections per rotation and 5122 detector elements. The data were backprojected into an image of 512 3 voxels using our PC-based approaches and the new CBE- based algorithms. Both the PC and the CBE timings were scaled to a 3 GHz clock frequency. On the CBE, we obtain total reconstruction times of 4.04 s for the parallel backprojection, 13.6 s for the perspective backprojection and 192 s for a complete OSC reconstruction, consisting of one initial Feldkamp reconstruction, followed by 4 OSC iterations.
7 CFR 250.16 - Maintenance of records.

Code of Federal Regulations, 2010 CFR

2010-01-01

... processor, food service management company, warehouse, or other entity which contracts with a distributing... Agriculture Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE GENERAL REGULATIONS AND POLICIES-FOOD DISTRIBUTION DONATION OF FOODS FOR USE IN THE UNITED STATES...
Nondestructive detection of infested chestnuts based on NIR spectroscopy

USDA-ARS?s Scientific Manuscript database

Insect feeding is a significant postharvest problem for processors of Chestnuts (Castanea sativa, Miller). In most cases, damage from insects is 'hidden', i.e. not visually detectable on the fruit surface. Consequently, traditional sorting techniques, including manual sorting, are generally inadequa...
A novel parallel architecture for local histogram equalization

NASA Astrophysics Data System (ADS)

Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan

2005-07-01

Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Analysis and simulation tools for solar array power systems

NASA Astrophysics Data System (ADS)

Pongratananukul, Nattorn

This dissertation presents simulation tools developed specifically for the design of solar array power systems. Contributions are made in several aspects of the system design phases, including solar source modeling, system simulation, and controller verification. A tool to automate the study of solar array configurations using general purpose circuit simulators has been developed based on the modeling of individual solar cells. Hierarchical structure of solar cell elements, including semiconductor properties, allows simulation of electrical properties as well as the evaluation of the impact of environmental conditions. A second developed tool provides a co-simulation platform with the capability to verify the performance of an actual digital controller implemented in programmable hardware such as a DSP processor, while the entire solar array including the DC-DC power converter is modeled in software algorithms running on a computer. This "virtual plant" allows developing and debugging code for the digital controller, and also to improve the control algorithm. One important task in solar arrays is to track the maximum power point on the array in order to maximize the power that can be delivered. Digital controllers implemented with programmable processors are particularly attractive for this task because sophisticated tracking algorithms can be implemented and revised when needed to optimize their performance. The proposed co-simulation tools are thus very valuable in developing and optimizing the control algorithm, before the system is built. Examples that demonstrate the effectiveness of the proposed methodologies are presented. The proposed simulation tools are also valuable in the design of multi-channel arrays. In the specific system that we have designed and tested, the control algorithm is implemented on a single digital signal processor. In each of the channels the maximum power point is tracked individually. In the prototype we built, off-the-shelf commercial DC-DC converters were utilized. At the end, the overall performance of the entire system was evaluated using solar array simulators capable of simulating various I-V characteristics, and also by using an electronic load. Experimental results are presented.
AAO2: a general purpose CCD controller for the AAT

NASA Astrophysics Data System (ADS)

Waller, Lew; Barton, John; Mayfield, Don; Griesbach, Jason

2004-09-01

The Anglo-Australian Observatory has developed a 2nd generation optical CCD controller to replace an earlier controller used now for almost twenty years. The new AAO2 controller builds on the considerable experience gained with the first controller, the new technologies now available and the techniques developed and successfully implemented in AAO's IRIS2 detector controller. The AAO2 controller has been designed to operate a wide variety of detectors and to achieve as near to detector limited performance as possible. It is capable of reading out CCDs with one, two or four output amplifiers, each output having its own video processor and high speed 16-bit ADC. The video processor is a correlated double sampler that may be switched between low noise dual slope integration or high speed clamp and sample modes. Programmable features include low noise DAC biases, horizontal clocks with DAC controllable levels and slopes and vertical clocks with DAC controllable arbitrary waveshapes. The controller uses two DSPs; one for overall control and the other for clock signal generation, which is highly programmable, with downloadable sequences of waveform patterns. The controller incorporates a precision detector temperature controller and provides accurate exposure time control. Telemetry is provided of all DAC generated voltages, many derived voltages, power supply voltages, detector temperature and detector identification. A high speed, full duplex fibre optic interface connects the controller to a host computer. The modular design uses six to ten circuit boards, plugged in to common backplanes. Two backplanes separate noisy digital signals from low noise analog signals.
FPGA Coprocessor for Accelerated Classification of Images

NASA Technical Reports Server (NTRS)

Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.

2008-01-01

An effort related to that described in the preceding article focuses on developing a spaceborne processing platform for fast and accurate onboard classification of image data, a critical part of modern satellite image processing. The approach again has been to exploit the versatility of recently developed hybrid Virtex-4FX field-programmable gate array (FPGA) to run diverse science applications on embedded processors while taking advantage of the reconfigurable hardware resources of the FPGAs. In this case, the FPGA serves as a coprocessor that implements legacy C-language support-vector-machine (SVM) image-classification algorithms to detect and identify natural phenomena such as flooding, volcanic eruptions, and sea-ice break-up. The FPGA provides hardware acceleration for increased onboard processing capability than previously demonstrated in software. The original C-language program demonstrated on an imaging instrument aboard the Earth Observing-1 (EO-1) satellite implements a linear-kernel SVM algorithm for classifying parts of the images as snow, water, ice, land, or cloud or unclassified. Current onboard processors, such as on EO-1, have limited computing power, extremely limited active storage capability and are no longer considered state-of-the-art. Using commercially available software that translates C-language programs into hardware description language (HDL) files, the legacy C-language program, and two newly formulated programs for a more capable expanded-linear-kernel and a more accurate polynomial-kernel SVM algorithm, have been implemented in the Virtex-4FX FPGA. In tests, the FPGA implementations have exhibited significant speedups over conventional software implementations running on general-purpose hardware.
Definition of an XML markup language for clinical laboratory procedures and comparison with generic XML markup.

PubMed

Saadawi, Gilan M; Harrison, James H

2006-10-01

Clinical laboratory procedure manuals are typically maintained as word processor files and are inefficient to store and search, require substantial effort for review and updating, and integrate poorly with other laboratory information. Electronic document management systems could improve procedure management and utility. As a first step toward building such systems, we have developed a prototype electronic format for laboratory procedures using Extensible Markup Language (XML). Representative laboratory procedures were analyzed to identify document structure and data elements. This information was used to create a markup vocabulary, CLP-ML, expressed as an XML Document Type Definition (DTD). To determine whether this markup provided advantages over generic markup, we compared procedures structured with CLP-ML or with the vocabulary of the Health Level Seven, Inc. (HL7) Clinical Document Architecture (CDA) narrative block. CLP-ML includes 124 XML tags and supports a variety of procedure types across different laboratory sections. When compared with a general-purpose markup vocabulary (CDA narrative block), CLP-ML documents were easier to edit and read, less complex structurally, and simpler to traverse for searching and retrieval. In combination with appropriate software, CLP-ML is designed to support electronic authoring, reviewing, distributing, and searching of clinical laboratory procedures from a central repository, decreasing procedure maintenance effort and increasing the utility of procedure information. A standard electronic procedure format could also allow laboratories and vendors to share procedures and procedure layouts, minimizing duplicative word processor editing. Our results suggest that laboratory-specific markup such as CLP-ML will provide greater benefit for such systems than generic markup.
Tracking contamination through ground beef production and identifying points of recontamination using a novel green fluorescent protein (GFP) expressing, E. coli O103, non-pathogenic surrogate

USDA-ARS?s Scientific Manuscript database

Introduction: Commonly, ground beef processors conduct studies to model contaminant flow through their production systems using surrogate organisms. Typical surrogate organisms may not behave as Escherichia coli O157:H7 during grinding and are not easy to detect at very low levels. Purpose: Develop...
Auditory Environment across the Life Span of Cochlear Implant Users: Insights from Data Logging

ERIC Educational Resources Information Center

Busch, Tobias; Vanpoucke, Filiep; van Wieringen, Astrid

2017-01-01

Purpose: We describe the natural auditory environment of people with cochlear implants (CIs), how it changes across the life span, and how it varies between individuals. Method: We performed a retrospective cross-sectional analysis of Cochlear Nucleus 6 CI sound-processor data logs. The logs were obtained from 1,501 people with CIs (ages 0-96…
40 CFR 704.33 - P-tert-butylbenzoic acid (P-TBBA), p-tert-butyltoluene (P-TBT) and p-tert-butylbenzaldehyde (P-TBB).

Code of Federal Regulations, 2012 CFR

2012-07-01

... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
40 CFR 704.33 - P-tert-butylbenzoic acid (P-TBBA), p-tert-butyltoluene (P-TBT) and p-tert-butylbenzaldehyde (P-TBB).

Code of Federal Regulations, 2011 CFR

2011-07-01

... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
40 CFR 704.33 - P-tert-butylbenzoic acid (P-TBBA), p-tert-butyltoluene (P-TBT) and p-tert-butylbenzaldehyde (P-TBB).

Code of Federal Regulations, 2014 CFR

2014-07-01

... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
40 CFR 704.33 - P-tert-butylbenzoic acid (P-TBBA), p-tert-butyltoluene (P-TBT) and p-tert-butylbenzaldehyde (P-TBB).

Code of Federal Regulations, 2013 CFR

2013-07-01

... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
Missile Systems Maintenance, AFSC 411XOB/C.

DTIC Science & Technology

1988-04-01

technician’s rating. A statistical measurement of their agreement, known as the interrater reliability (as assessed through components of variance of...senior technician’s ratings. A statistical measurement of their agreement, known as the interrater reliability (as assessed through components of...FABRICATION TRANSITORS *INPUT/OUTPUT (PERIPHERAL) DEVICES SOLID-STATE SPECIAL PURPOSE DEVICES COMPUTER MICRO PROCESSORS AND PROGRAMS POWER SUPPLIES
Incoherent optical generalized Hough transform: pattern recognition and feature extraction applications

NASA Astrophysics Data System (ADS)

Fernández, Ariel; Ferrari, José A.

2017-05-01

Pattern recognition and feature extraction are image processing applications of great interest in defect inspection and robot vision among others. In comparison to purely digital methods, the attractiveness of optical processors for pattern recognition lies in their highly parallel operation and real-time processing capability. This work presents an optical implementation of the generalized Hough transform (GHT), a well-established technique for recognition of geometrical features in binary images. Detection of a geometric feature under the GHT is accomplished by mapping the original image to an accumulator space; the large computational requirements for this mapping make the optical implementation an attractive alternative to digital-only methods. We explore an optical setup where the transformation is obtained, and the size and orientation parameters can be controlled, allowing for dynamic scale and orientation-variant pattern recognition. A compact system for the above purposes results from the use of an electrically tunable lens for scale control and a pupil mask implemented on a high-contrast spatial light modulator for orientation/shape variation of the template. Real-time can also be achieved. In addition, by thresholding of the GHT and optically inverse transforming, the previously detected features of interest can be extracted.

49 CFR 236.921 - Training and qualification program, general.

Code of Federal Regulations, 2010 CFR

2010-10-01

... INSTALLATION, INSPECTION, MAINTENANCE, AND REPAIR OF SIGNAL AND TRAIN CONTROL SYSTEMS, DEVICES, AND APPLIANCES Standards for Processor-Based Signal and Train Control Systems § 236.921 Training and qualification program..., wayside, or onboard subsystems; (2) Persons who dispatch train operations (issue or communicate any...
Parallel Ray Tracing Using the Message Passing Interface

DTIC Science & Technology

2007-09-01

software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very...Cameron, Senior Member, IEEE Abstract—Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to...National Aeronautics and Space Administration (NASA), optical ray tracing, parallel computing, parallel pro- cessing, prime numbers, ray tracing
Zonal methods for the parallel execution of range-limited N-body simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Kevin J.; Dror, Ron O.; Shaw, David E.

2007-01-20

Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introducedmore » two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.« less
Design of infrasound-detection system via adaptive LMSTDE algorithm

NASA Technical Reports Server (NTRS)

Khalaf, C. S.; Stoughton, J. W.

1984-01-01

A proposed solution to an aviation safety problem is based on passive detection of turbulent weather phenomena through their infrasonic emission. This thesis describes a system design that is adequate for detection and bearing evaluation of infrasounds. An array of four sensors, with the appropriate hardware, is used for the detection part. Bearing evaluation is based on estimates of time delays between sensor outputs. The generalized cross correlation (GCC), as the conventional time-delay estimation (TDE) method, is first reviewed. An adaptive TDE approach, using the least mean square (LMS) algorithm, is then discussed. A comparison between the two techniques is made and the advantages of the adaptive approach are listed. The behavior of the GCC, as a Roth processor, is examined for the anticipated signals. It is shown that the Roth processor has the desired effect of sharpening the peak of the correlation function. It is also shown that the LMSTDE technique is an equivalent implementation of the Roth processor in the time domain. A LMSTDE lead-lag model, with a variable stability coefficient and a convergence criterion, is designed.
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver

NASA Astrophysics Data System (ADS)

Chacon, Luis

2008-11-01

Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
Methods and systems for providing reconfigurable and recoverable computing resources

NASA Technical Reports Server (NTRS)

Stange, Kent (Inventor); Hess, Richard (Inventor); Kelley, Gerald B (Inventor); Rogers, Randy (Inventor)

2010-01-01

A method for optimizing the use of digital computing resources to achieve reliability and availability of the computing resources is disclosed. The method comprises providing one or more processors with a recovery mechanism, the one or more processors executing one or more applications. A determination is made whether the one or more processors needs to be reconfigured. A rapid recovery is employed to reconfigure the one or more processors when needed. A computing system that provides reconfigurable and recoverable computing resources is also disclosed. The system comprises one or more processors with a recovery mechanism, with the one or more processors configured to execute a first application, and an additional processor configured to execute a second application different than the first application. The additional processor is reconfigurable with rapid recovery such that the additional processor can execute the first application when one of the one more processors fails.
smallWig: parallel compression of RNA-seq WIG files.

PubMed

Wang, Zhiying; Weissman, Tsachy; Milenkovic, Olgica

2016-01-15

We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates. We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3-17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2-3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2-5.2 times. The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html, http://publish.illinois.edu/milenkovic/, http://web.stanford.edu/~tsachy/. zhiyingw@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Rectangular Array Of Digital Processors For Planning Paths

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

1993-01-01

Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Buffered coscheduling for parallel programming and enhanced fault tolerance

DOEpatents

Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM

2006-01-31

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
Phase-locked tracking loops for LORAN-C

NASA Technical Reports Server (NTRS)

Burhans, R. W.

1978-01-01

Portable battery operated LORAN-C receivers were fabricated to evaluate simple envelope detector methods with hybrid analog to digital phase locked loop sensor processors. The receivers are used to evaluate LORAN-C in general aviation applications. Complete circuit details are given for the experimental sensor and readout system.
Ray tracing on the MPP

NASA Technical Reports Server (NTRS)

Dorband, John E.

1987-01-01

Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
76 FR 18783 - United States et al.

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-05

... containers. Dean, Foremost, and other school milk suppliers often use distributors to supply and service school districts. Dairy processors generally use one distributor per service area. While school milk... capability to deliver to all of the schools in the district. Many require early morning or other specific...
SPAR improved structure-fluid dynamic analysis capability, phase 2

NASA Technical Reports Server (NTRS)

Pearson, M. L.

1984-01-01

An efficient and general method of analyzing a coupled dynamic system of fluid flow and elastic structures is investigated. The improvement of Structural Performance Analysis and Redesign (SPAR) code is summarized. All error codes are documented and the SPAR processor/subroutine cross reference is included.
Flight design system level C requirements. Solid rocket booster and external tank impact prediction processors. [space transportation system

NASA Technical Reports Server (NTRS)

Seale, R. H.

1979-01-01

The prediction of the SRB and ET impact areas requires six separate processors. The SRB impact prediction processor computes the impact areas and related trajectory data for each SRB element. Output from this processor is stored on a secure file accessible by the SRB impact plot processor which generates the required plots. Similarly the ET RTLS impact prediction processor and the ET RTLS impact plot processor generates the ET impact footprints for return-to-launch-site (RTLS) profiles. The ET nominal/AOA/ATO impact prediction processor and the ET nominal/AOA/ATO impact plot processor generate the ET impact footprints for non-RTLS profiles. The SRB and ET impact processors compute the size and shape of the impact footprints by tabular lookup in a stored footprint dispersion data base. The location of each footprint is determined by simulating a reference trajectory and computing the reference impact point location. To insure consistency among all flight design system (FDS) users, much input required by these processors will be obtained from the FDS master data base.
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis

NASA Technical Reports Server (NTRS)

Lim, Chieng-Fai

1991-01-01

The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
Some issues related to simulation of the tracking and communications computer network

NASA Technical Reports Server (NTRS)

Lacovara, Robert C.

1989-01-01

The Communications Performance and Integration branch of the Tracking and Communications Division has an ongoing involvement in the simulation of its flight hardware for Space Station Freedom. Specifically, the communication process between central processor(s) and orbital replaceable units (ORU's) is simulated with varying degrees of fidelity. The results of investigations into three aspects of this simulation effort are given. The most general area involves the use of computer assisted software engineering (CASE) tools for this particular simulation. The second area of interest is simulation methods for systems of mixed hardware and software. The final area investigated is the application of simulation methods to one of the proposed computer network protocols for space station, specifically IEEE 802.4.
Some issues related to simulation of the tracking and communications computer network

NASA Astrophysics Data System (ADS)

Lacovara, Robert C.

1989-12-01

The Communications Performance and Integration branch of the Tracking and Communications Division has an ongoing involvement in the simulation of its flight hardware for Space Station Freedom. Specifically, the communication process between central processor(s) and orbital replaceable units (ORU's) is simulated with varying degrees of fidelity. The results of investigations into three aspects of this simulation effort are given. The most general area involves the use of computer assisted software engineering (CASE) tools for this particular simulation. The second area of interest is simulation methods for systems of mixed hardware and software. The final area investigated is the application of simulation methods to one of the proposed computer network protocols for space station, specifically IEEE 802.4.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
The Influence of Large-Scale Computing on Aircraft Structural Design.

DTIC Science & Technology

1986-04-01

the customer in the most cost- effective manner. Computer facility organizations became computer resource power brokers. A good data processing...capabilities generated on other processors can be easily used. This approach is easily implementable and provides a good strategy for using existing...assistance to member nations for the purpose of increasing their scientific and technical potential; - Recommending effective ways for the member nations to

Earth Sciences Requirements for the Information Sciences Experiment System

NASA Technical Reports Server (NTRS)

Bowker, David E. (Editor); Katzberg, Steve J. (Editor); Wilson, R. Gale (Editor)

1990-01-01

The purpose of the workshop was to further explore and define the earth sciences requirements for the Information Sciences Experiment System (ISES), a proposed onboard data processor with real-time communications capability intended to support the Earth Observing System (Eos). A review of representative Eos instrument types is given and a preliminary set of real-time data needs has been established. An executive summary is included.
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Application experience with the NASA aircraft interrogation and display system - A ground-support equipment for digital flight systems

NASA Technical Reports Server (NTRS)

Glover, R. D.

1983-01-01

The NASA Dryden Flight Research Facility has developed a microprocessor-based, user-programmable, general-purpose aircraft interrogation and display system (AIDS). The hardware and software of this ground-support equipment have been designed to permit diverse applications in support of aircraft digital flight-control systems and simulation facilities. AIDS is often employed to provide engineering-units display of internal digital system parameters during development and qualification testing. Such visibility into the system under test has proved to be a key element in the final qualification testing of aircraft digital flight-control systems. Three first-generation 8-bit units are now in service in support of several research aircraft projects, and user acceptance has been high. A second-generation design, extended AIDS (XAIDS), incorporating multiple 16-bit processors, is now being developed to support the forward swept wing aircraft project (X-29A). This paper outlines the AIDS concept, summarizes AIDS operational experience, and describes the planned XAIDS design and mechanization.
Defence electronics industry profile, 1990-1991

NASA Astrophysics Data System (ADS)

The defense electronics industry profiled in this review comprises an estimated 150 Canadian companies that develop, manufacture, and repair radio and communications equipment, radars for surveillance and navigation, air traffic control systems, acoustic and infrared sensors, computers for navigation and fire control, signal processors and display units, special-purpose electronic components, and systems engineering and associated software. Canadian defense electronics companies generally serve market niches and end users of their products are limited to the military, government agencies, or commercial airlines. Geographically, the industry is concentrated in Ontario and Quebec, where about 91 percent of the industry's production and employment is found. In 1989, the estimated revenue of the industry was $2.36 billion, and exports totalled an estimated $1.4 billion. Strengths and weaknesses of the industry are discussed in terms of such factors as the relatively small size of Canadian companies, the ability of Canadian firms to access research and development opportunities and export markets in the United States, the dependence on foreign-made components, and international competition.
On-line surface inspection using cylindrical lens-based spectral domain low-coherence interferometry.

PubMed

Tang, Dawei; Gao, Feng; Jiang, X

2014-08-20

We present a spectral domain low-coherence interferometry (SD-LCI) method that is effective for applications in on-line surface inspection because it can obtain a surface profile in a single shot. It has an advantage over existing spectral interferometry techniques by using cylindrical lenses as the objective lenses in a Michelson interferometric configuration to enable the measurement of long profiles. Combined with a modern high-speed CCD camera, general-purpose graphics processing unit, and multicore processors computing technology, fast measurement can be achieved. By translating the tested sample during the measurement procedure, real-time surface inspection was implemented, which is proved by the large-scale 3D surface measurement in this paper. ZEMAX software is used to simulate the SD-LCI system and analyze the alignment errors. Two step height surfaces were measured, and the captured interferograms were analyzed using a fast Fourier transform algorithm. Both 2D profile results and 3D surface maps closely align with the calibrated specifications given by the manufacturer.
Using Neural Networks in Decision Making for a Reconfigurable Electro Mechanical Actuator (EMA)

NASA Technical Reports Server (NTRS)

Latino, Carl D.

2001-01-01

The objectives of this project were to demonstrate applicability and advantages of a neural network approach for evaluating the performance of an electro-mechanical actuator (EMA). The EMA in question was intended for the X-37 Advanced Technology Vehicle. It will have redundant components for safety and reliability. The neural networks for this application are to monitor the operation of the redundant electronics that control the actuator in real time and decide on the operating configuration. The system we proposed consists of the actuator, sensors, control circuitry and dedicated (embedded) processors. The main purpose of the study was to develop suitable hardware and neural network capable of allowing real time reconfiguration decisions to be made. This approach was to be compared to other methods such as fuzzy logic and knowledge based systems considered for the same application. Over the course of the project a more general objective was the identification of the other neural network applications and the education of interested NASA personnel on the topic of Neural Networks.
All Natural and Clean-Label Preservatives and Antimicrobial Agents Used during Poultry Processing and Packaging.

PubMed

Grant, Ar'quette; Parveen, Salina

2017-04-01

The poultry industry is faced with compounding pressures of maintaining product safety and wholesomeness while keeping up with consumer trends of all-natural foods and label accuracy. Consumers are increasingly demanding that their foods be minimally processed and contain compounds that are easily read and recognized, i.e., products must be clean labeled. The purpose of this review is to briefly describe several natural antimicrobial agents that can be incorporated into poultry processing. These compounds and their essential oils were included in this mini-review because they are generally recognized as safe by the U.S. Food and Drug Administration and are considered clean label: thyme extract, rosemary extract, garlic, and oregano. This list of natural antimicrobial agents by no means includes all of the options available to poultry processors. Rather, this review provides a brief glance at the potential these natural antimicrobial agents have in terms of reduced pathogenicity, increased shelf stability, and sensory acceptability through direct product application or as part of the product packaging.
Graphical processors for HEP trigger systems

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-02-01

General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
An FPGA- Based General-Purpose Data Acquisition Controller

NASA Astrophysics Data System (ADS)

Robson, C. C. W.; Bousselham, A.; Bohm

2006-08-01

System development in advanced FPGAs allows considerable flexibility, both during development and in production use. A mixed firmware/software solution allows the developer to choose what shall be done in firmware or software, and to make that decision late in the process. However, this flexibility comes at the cost of increased complexity. We have designed a modular development framework to help to overcome these issues of increased complexity. This framework comprises a generic controller that can be adapted for different systems by simply changing the software or firmware parts. The controller can use both soft and hard processors, with or without an RTOS, based on the demands of the system to be developed. The resulting system uses the Internet for both control and data acquisition. In our studies we developed the embedded system in a Xilinx Virtex-II Pro FPGA, where we used both PowerPC and MicroBlaze cores, http, Java, and LabView for control and communication, together with the MicroC/OS-II and OSE operating systems
ROBUS-2: A Fault-Tolerant Broadcast Communication System

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

2005-01-01

The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated modular architecture currently under development at NASA Langley Research Center. The ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, clock synchronization, and distributed diagnosis (group membership). The ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 is tolerant to internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. This version of the ROBUS is intended for laboratory experimentation and demonstrations of the capability to reintegrate failed nodes, dynamically update the communication schedule, and tolerate and recover from correlated transient faults.
RPython high-level synthesis

NASA Astrophysics Data System (ADS)

Cieszewski, Radoslaw; Linczuk, Maciej

2016-09-01

The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.
Conceptual design of a data reduction system

NASA Technical Reports Server (NTRS)

1983-01-01

A telemetry data processing system was defined of the Data Reduction. Data reduction activities in support of the developmental flights of the Space Shuttle were used as references against which requirements are assessed in general terms. A conceptual system design believed to offer significant throughput for the anticipated types of data reduction activities is presented. The design identifies the use of a large, intermediate data store as a key element in a complex of high speed, single purpose processors, each of which performs predesignated, repetitive operations on either raw or partially processed data. The recommended approach to implement the design concept is to adopt an established interface standard and rely heavily on mature or promising technologies which are considered main stream of the integrated circuit industry. The design system concept, is believed to be implementable without reliance on exotic devices and/or operational procedures. Numerical methods were employed to examine the feasibility of digital discrimination of FDM composite signals, and of eliminating line frequency noises in data measurements.
Knowledge-based processing for aircraft flight control

NASA Technical Reports Server (NTRS)

Painter, John H.

1991-01-01

The purpose is to develop algorithms and architectures for embedding artificial intelligence in aircraft guidance and control systems. With the approach adopted, AI-computing is used to create an outer guidance loop for driving the usual aircraft autopilot. That is, a symbolic processor monitors the operation and performance of the aircraft. Then, based on rules and other stored knowledge, commands are automatically formulated for driving the autopilot so as to accomplish desired flight operations. The focus is on developing a software system which can respond to linguistic instructions, input in a standard format, so as to formulate a sequence of simple commands to the autopilot. The instructions might be a fairly complex flight clearance, input either manually or by data-link. Emphasis is on a software system which responds much like a pilot would, employing not only precise computations, but, also, knowledge which is less precise, but more like common-sense. The approach is based on prior work to develop a generic 'shell' architecture for an AI-processor, which may be tailored to many applications by describing the application in appropriate processor data bases (libraries). Such descriptions include numerical models of the aircraft and flight control system, as well as symbolic (linguistic) descriptions of flight operations, rules, and tactics.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
System, methods and apparatus for program optimization for multi-threaded processor architectures

DOEpatents

Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E

2015-01-06

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
The Model Human Processor and the Older Adult: Parameter Estimation and Validation Within a Mobile Phone Task

PubMed Central

Jastrzembski, Tiffany S.; Charness, Neil

2009-01-01

The authors estimate weighted mean values for nine information processing parameters for older adults using the Card, Moran, and Newell (1983) Model Human Processor model. The authors validate a subset of these parameters by modeling two mobile phone tasks using two different phones and comparing model predictions to a sample of younger (N = 20; Mage = 20) and older (N = 20; Mage = 69) adults. Older adult models fit keystroke-level performance at the aggregate grain of analysis extremely well (R = 0.99) and produced equivalent fits to previously validated younger adult models. Critical path analyses highlighted points of poor design as a function of cognitive workload, hardware/software design, and user characteristics. The findings demonstrate that estimated older adult information processing parameters are valid for modeling purposes, can help designers understand age-related performance using existing interfaces, and may support the development of age-sensitive technologies. PMID:18194048
Methane Post-Processor Development to Increase Oxygen Recovery beyond State-of-the-Art Carbon Dioxide Reduction Technology

NASA Technical Reports Server (NTRS)

Abney, Morgan; Miller, Lee; Greenwood, Zach; Iannantuono, Michelle; Jones, Kenny

2013-01-01

State-of-the-art life support carbon dioxide (CO2) reduction technology, based on the Sabatier reaction, is theoretically capable of 50% recovery of oxygen from metabolic CO2. This recovery is constrained by the limited availability of reactant hydrogen. Post-processing of the methane byproduct from the Sabatier reactor results in hydrogen recycle and a subsequent increase in oxygen recovery. For this purpose, a Methane Post-Processor Assembly containing three sub-systems has been developed and tested. The assembly includes a Methane Purification Assembly (MePA) to remove residual CO2 and water vapor from the Sabatier product stream, a Plasma Pyrolysis Assembly (PPA) to partially pyrolyze methane into hydrogen and acetylene, and an Acetylene Separation Assembly (ASepA) to purify the hydrogen product for recycle. The results of partially integrated testing of the sub-systems are reported.
Methane Post-Processor Development to Increase Oxygen Recovery beyond State-of-the-Art Carbon Dioxide Reduction Technology

NASA Technical Reports Server (NTRS)

Abney, Morgan B.; Greenwood, Zachary; Miller, Lee A.; Alvarez, Giraldo; Iannantuono, Michelle; Jones, Kenny

2013-01-01

State-of-the-art life support carbon dioxide (CO2) reduction technology, based on the Sabatier reaction, is theoretically capable of 50% recovery of oxygen from metabolic CO2. This recovery is constrained by the limited availability of reactant hydrogen. Post-processing of the methane byproduct from the Sabatier reactor results in hydrogen recycle and a subsequent increase in oxygen recovery. For this purpose, a Methane Post-Processor Assembly containing three sub-systems has been developed and tested. The assembly includes a Methane Purification Assembly (MePA) to remove residual CO2 and water vapor from the Sabatier product stream, a Plasma Pyrolysis Assembly (PPA) to partially pyrolyze methane into hydrogen and acetylene, and an Acetylene Separation Assembly (ASepA) to purify the hydrogen product for recycle. The results of partially integrated testing of the sub-systems are reported
The Model Human Processor and the older adult: parameter estimation and validation within a mobile phone task.

PubMed

Jastrzembski, Tiffany S; Charness, Neil

2007-12-01

The authors estimate weighted mean values for nine information processing parameters for older adults using the Card, Moran, and Newell (1983) Model Human Processor model. The authors validate a subset of these parameters by modeling two mobile phone tasks using two different phones and comparing model predictions to a sample of younger (N = 20; M-sub(age) = 20) and older (N = 20; M-sub(age) = 69) adults. Older adult models fit keystroke-level performance at the aggregate grain of analysis extremely well (R = 0.99) and produced equivalent fits to previously validated younger adult models. Critical path analyses highlighted points of poor design as a function of cognitive workload, hardware/software design, and user characteristics. The findings demonstrate that estimated older adult information processing parameters are valid for modeling purposes, can help designers understand age-related performance using existing interfaces, and may support the development of age-sensitive technologies.

Portable multi-node LQCD Monte Carlo simulations using OpenACC

NASA Astrophysics Data System (ADS)

Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
A general graphical user interface for automatic reliability modeling

NASA Technical Reports Server (NTRS)

Liceaga, Carlos A.; Siewiorek, Daniel P.

1991-01-01

Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given.
Cochlear implant microphone location affects speech recognition in diffuse noise.

PubMed

Kolberg, Elizabeth R; Sheffield, Sterling W; Davis, Timothy J; Sunderhaus, Linsey W; Gifford, René H

2015-01-01

Despite improvements in cochlear implants (CIs), CI recipients continue to experience significant communicative difficulty in background noise. Many potential solutions have been proposed to help increase signal-to-noise ratio in noisy environments, including signal processing and external accessories. To date, however, the effect of microphone location on speech recognition in noise has focused primarily on hearing aid users. The purpose of this study was to (1) measure physical output for the T-Mic as compared with the integrated behind-the-ear (BTE) processor mic for various source azimuths, and (2) to investigate the effect of CI processor mic location for speech recognition in semi-diffuse noise with speech originating from various source azimuths as encountered in everyday communicative environments. A repeated-measures, within-participant design was used to compare performance across listening conditions. A total of 11 adults with Advanced Bionics CIs were recruited for this study. Physical acoustic output was measured on a Knowles Experimental Mannequin for Acoustic Research (KEMAR) for the T-Mic and BTE mic, with broadband noise presented at 0 and 90° (directed toward the implant processor). In addition to physical acoustic measurements, we also assessed recognition of sentences constructed by researchers at Texas Instruments, the Massachusetts Institute of Technology, and the Stanford Research Institute (TIMIT sentences) at 60 dBA for speech source azimuths of 0, 90, and 270°. Sentences were presented in a semi-diffuse restaurant noise originating from the R-SPACE 8-loudspeaker array. Signal-to-noise ratio was determined individually to achieve approximately 50% correct in the unilateral implanted listening condition with speech at 0°. Performance was compared across the T-Mic, 50/50, and the integrated BTE processor mic. The integrated BTE mic provided approximately 5 dB attenuation from 1500-4500 Hz for signals presented at 0° as compared with 90° (directed toward the processor). The T-Mic output was essentially equivalent for sources originating from 0 and 90°. Mic location also significantly affected sentence recognition as a function of source azimuth, with the T-Mic yielding the highest performance for speech originating from 0°. These results have clinical implications for (1) future implant processor design with respect to mic location, (2) mic settings for implant recipients, and (3) execution of advanced speech testing in the clinic. American Academy of Audiology.
Development and Operation of a Database Machine for Online Access and Update of a Large Database.

ERIC Educational Resources Information Center

Rush, James E.

1980-01-01

Reviews the development of a fault tolerant database processor system which replaced OCLC's conventional file system. A general introduction to database management systems and the operating environment is followed by a description of the hardware selection, software processes, and system characteristics. (SW)
12 CFR 235.7 - Limitations on payment card restrictions.

Code of Federal Regulations, 2014 CFR

2014-01-01

... Section 235.7 Banks and Banking FEDERAL RESERVE SYSTEM (CONTINUED) BOARD OF GOVERNORS OF THE FEDERAL... restrictions. (a) Prohibition on network exclusivity—(1) In general. An issuer or payment card network shall not directly or through any agent, processor, or licensed member of a payment card network, by...
12 CFR 235.7 - Limitations on payment card restrictions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... Section 235.7 Banks and Banking FEDERAL RESERVE SYSTEM (CONTINUED) BOARD OF GOVERNORS OF THE FEDERAL... restrictions. (a) Prohibition on network exclusivity—(1) In general. An issuer or payment card network shall not directly or through any agent, processor, or licensed member of a payment card network, by...
12 CFR 235.7 - Limitations on payment card restrictions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... Section 235.7 Banks and Banking FEDERAL RESERVE SYSTEM (CONTINUED) BOARD OF GOVERNORS OF THE FEDERAL... restrictions. (a) Prohibition on network exclusivity—(1) In general. An issuer or payment card network shall not directly or through any agent, processor, or licensed member of a payment card network, by...
29 CFR 780.126 - Contract arrangements for raising poultry.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 29 Labor 3 2010-07-01 2010-07-01 false Contract arrangements for raising poultry. 780.126 Section... General Scope of Agriculture Raising of Livestock, Bees, Fur-Bearing Animals, Or Poultry § 780.126 Contract arrangements for raising poultry. Feed dealers and processors sometimes enter into contractual...
50 CFR 679.83 - Rockfish Program entry level fishery.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 50 Wildlife and Fisheries 9 2010-10-01 2010-10-01 false Rockfish Program entry level fishery. 679... ALASKA Rockfish Program § 679.83 Rockfish Program entry level fishery. (a) Rockfish entry level fishery—(1) General. A rockfish entry level harvester and rockfish entry level processor may participate in...
21 CFR 123.7 - Corrective actions.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 21 Food and Drugs 2 2010-04-01 2010-04-01 false Corrective actions. 123.7 Section 123.7 Food and... CONSUMPTION FISH AND FISHERY PRODUCTS General Provisions § 123.7 Corrective actions. (a) Whenever a deviation from a critical limit occurs, a processor shall take corrective action either by: (1) Following a...
The computational structural mechanics testbed generic structural-element processor manual

NASA Technical Reports Server (NTRS)

Stanley, Gary M.; Nour-Omid, Shahram

1990-01-01

The usage and development of structural finite element processors based on the CSM Testbed's Generic Element Processor (GEP) template is documented. By convention, such processors have names of the form ESi, where i is an integer. This manual is therefore intended for both Testbed users who wish to invoke ES processors during the course of a structural analysis, and Testbed developers who wish to construct new element processors (or modify existing ones).
Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1994-01-01

In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, Michael S.; Strip, David R.

1996-01-01

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
Switch for serial or parallel communication networks

DOEpatents

Crosette, D.B.

1994-07-19

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.
Switch for serial or parallel communication networks

DOEpatents

Crosette, Dario B.

1994-01-01

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.
A novel speech-processing strategy incorporating tonal information for cochlear implants.

PubMed

Lan, N; Nie, K B; Gao, S K; Zeng, F G

2004-05-01

Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Conditions for space invariance in optical data processors used with coherent or noncoherent light.

PubMed

Arsenault, H R

1972-10-01

The conditions for space invariance in coherent and noncoherent optical processors are considered. All linear optical processors are shown to belong to one of two types. The conditions for space invariance are more stringent for noncoherent processors than for coherent processors, so that a system that is linear in coherent light may be nonlinear in noncoherent light. However, any processor that is linear in noncoherent light is also linear in the coherent limit.
The Sequential Implementation of Array Processors when there is Directional Uncertainty

DTIC Science & Technology

1975-08-01

University of Washington kindly supplied office space and ccputing facilities. -The author hat, benefited greatly from discussions with several other...if i Q- inverse of Q I L general observation space R general vector of observation _KR general observation vector of dimension K Exiv] "Tf -- ’ -"-T’T...7" i ’i ’:"’ - ’ ; ’ ’ ’ ’ ’ ’" ’"- Glossary of Symbols (continued) R. ith observation 1 Rm real vector space of dimension m R(T) autocorrelation

Automatic maintenance payload on board of a Mexican LEO microsatellite

NASA Astrophysics Data System (ADS)

Vicente-Vivas, Esaú; García-Nocetti, Fabián; Mendieta-Jiménez, Francisco

2006-02-01

Few research institutions from Mexico work together to finalize the integration of a technological demonstration microsatellite called Satex, aiming the launching of the first ever fully designed and manufactured domestic space vehicle. The project is based on technical knowledge gained in previous space experiences, particularly in developing GASCAN automatic experiments for NASA's space shuttle, and in some support obtained from the local team which assembled the México-OSCAR-30 microsatellites. Satex includes three autonomous payloads and a power subsystem, each one with a local microcomputer to provide intelligent and dedicated control. It also contains a flight computer (FC) with a pair of full redundancies. This enables the remote maintenance of processing boards from the ground station. A fourth communications payload depends on the flight computer for control purposes. A fifth payload was decided to be developed for the satellite. It adds value to the available on-board computers and extends the opportunity for a developing country to learn and to generate domestic space technology. Its aim is to provide automatic maintenance capabilities for the most critical on-board computer in order to achieve continuous satellite operations. This paper presents the virtual computer architecture specially developed to provide maintenance capabilities to the flight computer. The architecture is periodically implemented by software with a small amount of physical processors (FC processors) and virtual redundancies (payload processors) to emulate a hybrid redundancy computer. Communications among processors are accomplished over a fault-tolerant LAN. This allows a versatile operating behavior in terms of data communication as well as in terms of distributed fault tolerance. Obtained results, payload validation and reliability results are also presented.
Design of object-oriented distributed simulation classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D. (Principal Investigator)

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Design of Object-Oriented Distributed Simulation Classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D.

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
LANDSAT-D flight segment operations manual. Appendix B: OBC software operations

NASA Technical Reports Server (NTRS)

Talipsky, R.

1981-01-01

The LANDSAT 4 satellite contains two NASA standard spacecraft computers and 65,536 words of memory. Onboard computer software is divided into flight executive and applications processors. Both applications processors and the flight executive use one or more of 67 system tables to obtain variables, constants, and software flags. Output from the software for monitoring operation is via 49 OBC telemetry reports subcommutated in the spacecraft telemetry. Information is provided about the flight software as it is used to control the various spacecraft operations and interpret operational OBC telemetry. Processor function descriptions, processor operation, software constraints, processor system tables, processor telemetry, and processor flow charts are presented.
Managing Power Heterogeneity

NASA Astrophysics Data System (ADS)

Pruhs, Kirk

A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study

NASA Technical Reports Server (NTRS)

Simon, Tyler; McGalliard, James

2009-01-01

Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Simulink/PARS Integration Support

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vacaliuc, B.; Nakhaee, N.

2013-12-18

The state of the art for signal processor hardware has far out-paced the development tools for placing applications on that hardware. In addition, signal processors are available in a variety of architectures, each uniquely capable of handling specific types of signal processing efficiently. With these processors becoming smaller and demanding less power, it has become possible to group multiple processors, a heterogeneous set of processors, into single systems. Different portions of the desired problem set can be assigned to different processor types as appropriate. As software development tools do not keep pace with these processors, especially when multiple processors ofmore » different types are used, a method is needed to enable software code portability among multiple processors and multiple types of processors along with their respective software environments. Sundance DSP, Inc. has developed a software toolkit called “PARS”, whose objective is to provide a framework that uses suites of tools provided by different vendors, along with modeling tools and a real time operating system, to build an application that spans different processor types. The software language used to express the behavior of the system is a very high level modeling language, “Simulink”, a MathWorks product. ORNL has used this toolkit to effectively implement several deliverables. This CRADA describes this collaboration between ORNL and Sundance DSP, Inc.« less
Forensic Analysis of the Sony Playstation Portable

NASA Astrophysics Data System (ADS)

Conrad, Scott; Rodriguez, Carlos; Marberry, Chris; Craiger, Philip

The Sony PlayStation Portable (PSP) is a popular portable gaming device with features such as wireless Internet access and image, music and movie playback. As with most systems built around a processor and storage, the PSP can be used for purposes other than it was originally intended - legal as well as illegal. This paper discusses the features of the PSP browser and suggests best practices for extracting digital evidence.
ISS EPS Orbital Replacement Unit Block Diagrams

NASA Technical Reports Server (NTRS)

Schmitz, Gregory V.

2001-01-01

The attached documents are being provided to Switching Power Magazine for information purposes. This magazine is writing a feature article on the International Space Station Electrical Power System, focusing on the switching power processors. These units include the DC-DC Converter Unit (DDCU), the Bi-directional Charge/Discharge Unit (BCDU), and the Sequential Shunt Unit (SSU). These diagrams are high-level schematics/block diagrams depicting the overall functionality of each unit.
Improving Research Methods for the Study of Geography and Mental Health: Utilization of Social Networking Data and the ESRI GeoEvent Processor

ERIC Educational Resources Information Center

McLaughlin, Courtney L.

2017-01-01

The purpose of this article is to review the literature on geography and mental health, report on a case example using new methods for studying this topic, and provide recommendations for future research. Over 25 years ago, Holley (1988) conducted a review of the literature on geography and mental health and astutely stated, "… it is…
RANS Simulations using OpenFOAM Software

DTIC Science & Technology

2016-01-01

Averaged Navier- Stokes (RANS) simulations is described and illustrated by applying the simpleFoam solver to two case studies; two dimensional flow...to run in parallel over large processor arrays. The purpose of this report is to illustrate and test the use of the steady-state Reynolds Averaged ...Group in the Maritime Platforms Division he has been simulating fluid flow around ships and submarines using finite element codes, Lagrangian vortex
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

NASA Astrophysics Data System (ADS)

Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

1995-10-01

Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, M.S.; Strip, D.R.

1996-01-30

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G.; Salapura, Valentina

2012-07-24

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

DOE PAGES

Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

2013-01-01

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
The Formal Specification of a Visual display Device: Design and Implementation.

DTIC Science & Technology

1985-06-01

The use of these data structures with their defined operations, give the programmer a very powerful instructions set. Like the DPU code generator in...which any AM hosted machine could faithfully display. 27 In- general , most applications have no need to create images from a data structure representing...formation of standard functional interfaces to these resources. OS’s generally do not provide a functional interface to either the processor or the display2
Data General Corporation Advanced Operating System/Virtual Storage (AOS/ VS). Revision 7.60

DTIC Science & Technology

1989-02-22

control list for each directory and data file. An access control list includes the users who can and cannot access files as well as the access...and any required data, it can -5- February 22, 1989 Final Evaluation Report Data General AOS/VS SYSTEM OVERVIEW operate asynchronously and in parallel...memory. The IOC can perform the data transfer without further interventiin from the CPU. The I/O channels interface with the processor or system
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing

PubMed Central

Farabet, Clément; Paz, Rafael; Pérez-Carrasco, Jose; Zamarreño-Ramos, Carlos; Linares-Barranco, Alejandro; LeCun, Yann; Culurciello, Eugenio; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabe

2012-01-01

Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons. PMID:22518097
SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes

NASA Astrophysics Data System (ADS)

Homann, Holger; Laenen, Francois

2018-03-01

The numerical study of physical problems often require integrating the dynamics of a large number of particles evolving according to a given set of equations. Particles are characterized by the information they are carrying such as an identity, a position other. There are generally speaking two different possibilities for handling particles in high performance computing (HPC) codes. The concept of an Array of Structures (AoS) is in the spirit of the object-oriented programming (OOP) paradigm in that the particle information is implemented as a structure. Here, an object (realization of the structure) represents one particle and a set of many particles is stored in an array. In contrast, using the concept of a Structure of Arrays (SoA), a single structure holds several arrays each representing one property (such as the identity) of the whole set of particles. The AoS approach is often implemented in HPC codes due to its handiness and flexibility. For a class of problems, however, it is known that the performance of SoA is much better than that of AoS. We confirm this observation for our particle problem. Using a benchmark we show that on modern Intel Xeon processors the SoA implementation is typically several times faster than the AoS one. On Intel's MIC co-processors the performance gap even attains a factor of ten. The same is true for GPU computing, using both computational and multi-purpose GPUs. Combining performance and handiness, we present the library SoAx that has optimal performance (on CPUs, MICs, and GPUs) while providing the same handiness as AoS. For this, SoAx uses modern C++ design techniques such template meta programming that allows to automatically generate code for user defined heterogeneous data structures.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.